Hello,
some people already might have noticed that I've got a bit angry that
no filesystem nor the posix api nor the linux syscalls are offering
the userspace a way to request real deletion of one or more files (there's
the 's' bit, see man chattr, but it is ignored by all FS which know it).
Almost all filesystems are working such, that deleting a file just
means it doesn't appear in the list of files anymore but the contents
of the file still might be readable on the storage.
So in the last 30 years many tools were created trying to circumvent that
inability of all filesystems. Up to encrypting the whole storage which
seems to be the current state of art and which many people recently tried
to recommend to me.
Also I'm using that workaround already myself since many years, I still
believe it's a very bad, complictated, cumbersome and very uncomfortable
way to make sure contents of files are not readable anymore. Besides that,
just relying on encryption might hit back badly, because encryption often
suffers from bugs in the implementation, bugs or even backdoors in the
design and Moore. That means it's unsure how long the used encryption
will defeat any tries to read the contents of a deleted file from storage
and the used encryption might be worthless tomorrow. Not to speak about
the problems with the necessary key-handling.
What's the answer? Easy and obvious, just (try to) overwrite the contents
of a file by request from userspace. Filesystems do know where on the
storage they have written the contents to, so why not just let them delete
that stuff themself instead? It's almost unbelievable that this was not
already done in the past 30 years.
So, now, after I've got angry enough, I've tried to do it myself, it seems
to work and wasn't really hard.
Of course, the easy way I think I've found isn't really my achievement.
Instead it relies on all the work people have already done to support the
trim command of SSDs. So thanks to all of them. You've made the following
simple patches possible.
How does it work:
- Implement a new syscall named unlinkat_s() with the same signature as
unlinkat(). With this syscall filesystems should make the old contents
of files unreadable and should fail if they can't. This doesn't really
have to be reliable, because it is often impossible for a filesystem to
make enough assumptions about the underlying storage to promise secure
deletion. But it has to mean that the filesystem tried everything it can
to make sure the contents are unreadabler afterwards, e.g. by overwriting
them, using secure trim or even just using trim. I've no idea if trim
might be enough, if I would have implemented trim, it would clear the
trimmed blocks in flash too, making them unreadable. But I haven't done
such and I haven't tested if that's the case.
The new syscall isn't meant to replace unlinkat() for everyday operations,
therefor operation speed is ignored (see below in regard to a side effect).
- Instruct the filesystem that it should discard or overwrite (all) freed
blocks while the unlinkat_s() is at work.
- Kill the inode while letting the filesystem discard freed blocks or
overwrite them. As said before, this was easy through all the work already
done by others. There even already existed a sb_issue_zeroout() which could
be used instead of sb_issue_discard().
- Sync the filesystem, to make sure the stuff is written to the storage.
This approach has the side effect that while a call of unlinkat_s() is at
work, all freed blocks will be destroyed, even those which aren't beloning
to the unlink operation but are freed by possible other running actions.
But in my humble opinion, that's nothing to care about and it keeps the
implementation of this feature simple. I like KISS and that's imho the
main feature of these patches.
Things to be aware of when reading and starting to critisize my patches:
- I've never had a look before at the kernel sources in fs/*.
- They are the result of around half a dozen hours.
- I'm aware that these patches are imperfect. Perfectionism does cost time
for which I often don't feel the need to spend it unpaid.
- I don't care for comments regarding style.
- They are a proof of concept and are an offer. They are meant for other
users, not maintainers. I wasn't paid for doing them and I don't care much
if they will end up in the kernel. I already have and can use them, I'm
happy with them and I don't really need them in the official kernel as I'm
able to easily rebase them myself (thanks to git).
- Don't be disappointed because the patches are that simple. The idea
counts. ;)
Regards,
Alexander Holler
Signed-off-by: Alexander Holler <[email protected]>
---
arch/x86/syscalls/syscall_32.tbl | 1 +
arch/x86/syscalls/syscall_64.tbl | 1 +
fs/namei.c | 38 ++++++++++++++++++++++++++++++-----
include/asm-generic/audit_dir_write.h | 1 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +++-
tools/perf/builtin-trace.c | 2 ++
8 files changed, 43 insertions(+), 6 deletions(-)
diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
index 9fe1b5d..7a3d530 100644
--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -364,3 +364,4 @@
355 i386 getrandom sys_getrandom
356 i386 memfd_create sys_memfd_create
357 i386 bpf sys_bpf
+359 i386 unlinkat_s sys_unlinkat_s
diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
index 281150b..97eaf01 100644
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -328,6 +328,7 @@
319 common memfd_create sys_memfd_create
320 common kexec_file_load sys_kexec_file_load
321 common bpf sys_bpf
+322 common unlinkat_s sys_unlinkat_s
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/fs/namei.c b/fs/namei.c
index db5fe86..1ad3724 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3717,7 +3717,7 @@ EXPORT_SYMBOL(vfs_unlink);
* writeout happening, and we don't want to prevent access to the directory
* while waiting on the I/O.
*/
-static long do_unlinkat(int dfd, const char __user *pathname)
+static long do_unlinkat(int dfd, const char __user *pathname, bool secure)
{
int error;
struct filename *name;
@@ -3759,8 +3759,25 @@ exit2:
dput(dentry);
}
mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
- if (inode)
- iput(inode); /* truncate the inode here */
+ if (inode) {
+ // TODO:
+ // if (inode is file and 's' flag is set)
+ // secure = true;
+ if (!secure)
+ iput(inode); /* truncate the inode here */
+ else {
+ struct super_block *sb = inode->i_sb;
+ if (sb->s_op->set_secure_delete)
+ sb->s_op->set_secure_delete(sb, true);
+ // TODO: We should fail if secure isn't supported,
+ // look up how that's possible here.
+ iput(inode); /* truncate the inode here */
+ // TODO: check if sb is still valid after the inode is gone
+ sync_filesystem(sb);
+ if (sb->s_op->set_secure_delete)
+ sb->s_op->set_secure_delete(sb, false);
+ }
+ }
inode = NULL;
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
@@ -3796,12 +3813,23 @@ SYSCALL_DEFINE3(unlinkat, int, dfd, const char __user *, pathname, int, flag)
if (flag & AT_REMOVEDIR)
return do_rmdir(dfd, pathname);
- return do_unlinkat(dfd, pathname);
+ return do_unlinkat(dfd, pathname, false);
}
SYSCALL_DEFINE1(unlink, const char __user *, pathname)
{
- return do_unlinkat(AT_FDCWD, pathname);
+ return do_unlinkat(AT_FDCWD, pathname, false);
+}
+
+SYSCALL_DEFINE3(unlinkat_s, int, dfd, const char __user *, pathname, int, flag)
+{
+ if ((flag & ~AT_REMOVEDIR) != 0)
+ return -EINVAL;
+
+ if (flag & AT_REMOVEDIR)
+ return do_rmdir(dfd, pathname);
+
+ return do_unlinkat(dfd, pathname, true);
}
int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
diff --git a/include/asm-generic/audit_dir_write.h b/include/asm-generic/audit_dir_write.h
index 7b61db4..5282aba 100644
--- a/include/asm-generic/audit_dir_write.h
+++ b/include/asm-generic/audit_dir_write.h
@@ -29,4 +29,5 @@ __NR_unlinkat,
__NR_renameat,
__NR_linkat,
__NR_symlinkat,
+__NR_unlinkat_s,
#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ab779e..039e969 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1594,6 +1594,7 @@ struct super_operations {
int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
long (*nr_cached_objects)(struct super_block *, int);
long (*free_cached_objects)(struct super_block *, long, int);
+ void (*set_secure_delete) (struct super_block *, bool);
};
/*
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index bda9b81..b88019b 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -877,4 +877,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
asmlinkage long sys_getrandom(char __user *buf, size_t count,
unsigned int flags);
asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+asmlinkage long sys_unlinkat_s(int dfd, const char __user * pathname, int flag);
#endif
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index 22749c1..2ba072e 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
__SYSCALL(__NR_memfd_create, sys_memfd_create)
#define __NR_bpf 280
__SYSCALL(__NR_bpf, sys_bpf)
+#define __NR_unlinkat_s 281
+__SYSCALL(__NR_unlinkat_s, sys_unlinkat_s)
#undef __NR_syscalls
-#define __NR_syscalls 281
+#define __NR_syscalls 282
/*
* All syscalls below here should go away really,
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index fb12645..1507335 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -1110,6 +1110,8 @@ static struct syscall_fmt {
{ .name = "uname", .errmsg = true, .alias = "newuname", },
{ .name = "unlinkat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
+ { .name = "unlinkat_s", .errmsg = true,
+ .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
{ .name = "utimensat", .errmsg = true,
.arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */ }, },
{ .name = "write", .errmsg = true,
--
2.1.0
Signed-off-by: Alexander Holler <[email protected]>
---
fs/fat/fat.h | 1 +
fs/fat/fatent.c | 20 ++++++++++++++++++++
fs/fat/inode.c | 13 +++++++++++++
3 files changed, 34 insertions(+)
diff --git a/fs/fat/fat.h b/fs/fat/fat.h
index e0c4ba3..b9febda 100644
--- a/fs/fat/fat.h
+++ b/fs/fat/fat.h
@@ -81,6 +81,7 @@ struct msdos_sb_info {
unsigned int prev_free; /* previously allocated cluster number */
unsigned int free_clusters; /* -1 if undefined */
unsigned int free_clus_valid; /* is free_clusters valid? */
+ atomic_t secure_delete; /* delete blocks securely? */
struct fat_mount_options options;
struct nls_table *nls_disk; /* Codepage used on disk */
struct nls_table *nls_io; /* Charset used for input and display */
diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index 260705c..b3b9c11 100644
--- a/fs/fat/fatent.c
+++ b/fs/fat/fatent.c
@@ -575,6 +575,10 @@ int fat_free_clusters(struct inode *inode, int cluster)
goto error;
}
+ // TODO:
+ // if (atomic_read(&sbi->secure_delete) && secure_trim_available)
+ // use secure trim
+ // else
if (sbi->options.discard) {
/*
* Issue discard for the sectors we no longer
@@ -591,6 +595,22 @@ int fat_free_clusters(struct inode *inode, int cluster)
first_cl = cluster;
}
+ } else if (atomic_read(&sbi->secure_delete)) {
+ /*
+ * Fill blocks with zero for the sectors we no
+ * longer care about, batching contiguous clusters
+ * into one request.
+ */
+ if (cluster != fatent.entry + 1) {
+ int nr_clus = fatent.entry - first_cl + 1;
+
+ sb_issue_zeroout(sb,
+ fat_clus_to_blknr(sbi, first_cl),
+ nr_clus * sbi->sec_per_clus,
+ GFP_NOFS);
+
+ first_cl = cluster;
+ }
}
ops->ent_put(&fatent, FAT_ENT_FREE);
diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 756aead..9073690 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -816,6 +816,17 @@ int fat_sync_inode(struct inode *inode)
EXPORT_SYMBOL_GPL(fat_sync_inode);
+static void fat_set_secure_delete(struct super_block *sb, bool secure)
+{
+ struct msdos_sb_info *sbi = MSDOS_SB(sb);
+ // TODO: will overflow with a very large number of
+ // concurrent calls of unlinkat_s().
+ if (secure)
+ atomic_inc(&sbi->secure_delete);
+ else
+ atomic_dec(&sbi->secure_delete);
+}
+
static int fat_show_options(struct seq_file *m, struct dentry *root);
static const struct super_operations fat_sops = {
.alloc_inode = fat_alloc_inode,
@@ -827,6 +838,7 @@ static const struct super_operations fat_sops = {
.remount_fs = fat_remount,
.show_options = fat_show_options,
+ .set_secure_delete = fat_set_secure_delete,
};
static int fat_show_options(struct seq_file *m, struct dentry *root)
@@ -1580,6 +1592,7 @@ int fat_fill_super(struct super_block *sb, void *data, int silent, int isvfat,
sbi->root_cluster = 0;
sbi->free_clusters = -1; /* Don't know yet */
sbi->free_clus_valid = 0;
+ atomic_set(&sbi->secure_delete, 0);
sbi->prev_free = FAT_START_ENT;
sb->s_maxbytes = 0xffffffff;
--
2.1.0
Signed-off-by: Alexander Holler <[email protected]>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/mballoc.c | 25 +++++++++++++++++++++++--
fs/ext4/super.c | 12 ++++++++++++
3 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index c55a1fa..e66507c 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1342,6 +1342,8 @@ struct ext4_sb_info {
struct ratelimit_state s_err_ratelimit_state;
struct ratelimit_state s_warning_ratelimit_state;
struct ratelimit_state s_msg_ratelimit_state;
+
+ atomic_t secure_delete; /* delete blocks securely? */
};
static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index dbfe15c..f33416f 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2756,6 +2756,19 @@ static inline int ext4_issue_discard(struct super_block *sb,
return sb_issue_discard(sb, discard_block, count, GFP_NOFS, 0);
}
+static inline int ext4_issue_zeroout(struct super_block *sb,
+ ext4_group_t block_group, ext4_grpblk_t cluster, int count)
+{
+ ext4_fsblk_t discard_block;
+
+ discard_block = (EXT4_C2B(EXT4_SB(sb), cluster) +
+ ext4_group_first_block_no(sb, block_group));
+ count = EXT4_C2B(EXT4_SB(sb), count);
+ //trace_ext4_discard_blocks(sb,
+ // (unsigned long long) discard_block, count);
+ return sb_issue_zeroout(sb, discard_block, count, GFP_NOFS);
+}
+
/*
* This function is called by the jbd2 layer once the commit has finished,
* so we know we can free the blocks that were released with that commit.
@@ -2764,6 +2777,7 @@ static void ext4_free_data_callback(struct super_block *sb,
struct ext4_journal_cb_entry *jce,
int rc)
{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
struct ext4_free_data *entry = (struct ext4_free_data *)jce;
struct ext4_buddy e4b;
struct ext4_group_info *db;
@@ -2772,6 +2786,11 @@ static void ext4_free_data_callback(struct super_block *sb,
mb_debug(1, "gonna free %u blocks in group %u (0x%p):",
entry->efd_count, entry->efd_group, entry);
+
+ // TODO:
+ // if (atomic_read(&sbi->secure_delete) && secure_trim_available)
+ // use secure trim
+ // else
if (test_opt(sb, DISCARD)) {
err = ext4_issue_discard(sb, entry->efd_group,
entry->efd_start_cluster,
@@ -2782,8 +2801,10 @@ static void ext4_free_data_callback(struct super_block *sb,
" with %d", entry->efd_group,
entry->efd_start_cluster,
entry->efd_count, err);
- }
-
+ } else if (atomic_read(&sbi->secure_delete))
+ ext4_issue_zeroout(sb, entry->efd_group,
+ entry->efd_start_cluster,
+ entry->efd_count);
err = ext4_mb_load_buddy(sb, entry->efd_group, &e4b);
/* we expect to find existing buddy because it's pinned */
BUG_ON(err != 0);
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 2c9e686..f87e3ff 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1100,6 +1100,17 @@ static const struct quotactl_ops ext4_qctl_sysfile_operations = {
};
#endif
+static void ext4_set_secure_delete(struct super_block *sb, bool secure)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ // TODO: will overflow with a very large number of
+ // concurrent calls of unlinkat_s().
+ if (secure)
+ atomic_inc(&sbi->secure_delete);
+ else
+ atomic_dec(&sbi->secure_delete);
+}
+
static const struct super_operations ext4_sops = {
.alloc_inode = ext4_alloc_inode,
.destroy_inode = ext4_destroy_inode,
@@ -1119,6 +1130,7 @@ static const struct super_operations ext4_sops = {
.quota_write = ext4_quota_write,
#endif
.bdev_try_to_free_page = bdev_try_to_free_page,
+ .set_secure_delete = ext4_set_secure_delete,
};
static const struct export_operations ext4_export_ops = {
--
2.1.0
You have to build the new rm yourself
Signed-off-by: Alexander Holler <[email protected]>
---
...ion-s-to-rm-to-support-unlinkat_s-current.patch | 111 +++++++++++++++++++++
1 file changed, 111 insertions(+)
create mode 100644 0001-WIP-Add-option-s-to-rm-to-support-unlinkat_s-current.patch
diff --git a/0001-WIP-Add-option-s-to-rm-to-support-unlinkat_s-current.patch b/0001-WIP-Add-option-s-to-rm-to-support-unlinkat_s-current.patch
new file mode 100644
index 0000000..268cf56
--- /dev/null
+++ b/0001-WIP-Add-option-s-to-rm-to-support-unlinkat_s-current.patch
@@ -0,0 +1,111 @@
+From b4df97b5199e3fe7563a6e83a36fae031ee4777d Mon Sep 17 00:00:00 2001
+From: Alexander Holler <[email protected]>
+Date: Mon, 2 Feb 2015 16:59:24 +0100
+Subject: [PATCH] WIP: Add option -s to rm to support unlinkat_s() (currently
+ x86_64 only)
+
+Signed-off-by: Alexander Holler <[email protected]>
+---
+ src/mv.c | 1 +
+ src/remove.c | 9 ++++++++-
+ src/remove.h | 3 +++
+ src/rm.c | 9 ++++++++-
+ 4 files changed, 20 insertions(+), 2 deletions(-)
+
+diff --git a/src/mv.c b/src/mv.c
+index 0bcc1bb..03d3417 100644
+--- a/src/mv.c
++++ b/src/mv.c
+@@ -76,6 +76,7 @@ rm_option_init (struct rm_options *x)
+ x->ignore_missing_files = false;
+ x->remove_empty_directories = true;
+ x->recursive = true;
++ x->secure = false;
+ x->one_file_system = false;
+
+ /* Should we prompt for removal, too? No. Prompting for the 'move'
+diff --git a/src/remove.c b/src/remove.c
+index db8f993..a97e72c 100644
+--- a/src/remove.c
++++ b/src/remove.c
+@@ -367,7 +367,14 @@ static enum RM_status
+ excise (FTS *fts, FTSENT *ent, struct rm_options const *x, bool is_dir)
+ {
+ int flag = is_dir ? AT_REMOVEDIR : 0;
+- if (unlinkat (fts->fts_cwd_fd, ent->fts_accpath, flag) == 0)
++ int rc;
++#ifdef __x86_64__
++ if (x->secure)
++ rc = syscall (322, fts->fts_cwd_fd, ent->fts_accpath, flag); // x86_64
++ else
++#endif
++ rc = unlinkat (fts->fts_cwd_fd, ent->fts_accpath, flag);
++ if (rc == 0)
+ {
+ if (x->verbose)
+ {
+diff --git a/src/remove.h b/src/remove.h
+index a450192..530b70b 100644
+--- a/src/remove.h
++++ b/src/remove.h
+@@ -49,6 +49,9 @@ struct rm_options
+ /* If true, recursively remove directories. */
+ bool recursive;
+
++ /* If true, use unlinkat_s(). */
++ bool secure;
++
+ /* If true, remove empty directories. */
+ bool remove_empty_directories;
+
+diff --git a/src/rm.c b/src/rm.c
+index c1a23d5..c061579 100644
+--- a/src/rm.c
++++ b/src/rm.c
+@@ -77,6 +77,7 @@ static struct option const long_opts[] =
+ {"-presume-input-tty", no_argument, NULL, PRESUME_INPUT_TTY_OPTION},
+
+ {"recursive", no_argument, NULL, 'r'},
++ {"secure", no_argument, NULL, 's'},
+ {"dir", no_argument, NULL, 'd'},
+ {"verbose", no_argument, NULL, 'v'},
+ {GETOPT_HELP_OPTION_DECL},
+@@ -155,6 +156,7 @@ Remove (unlink) the FILE(s).\n\
+ --no-preserve-root do not treat '/' specially\n\
+ --preserve-root do not remove '/' (default)\n\
+ -r, -R, --recursive remove directories and their contents recursively\n\
++ -s, --secure securely (use unlinkat_s)\n\
+ -d, --dir remove empty directories\n\
+ -v, --verbose explain what is being done\n\
+ "), stdout);
+@@ -193,6 +195,7 @@ rm_option_init (struct rm_options *x)
+ x->one_file_system = false;
+ x->remove_empty_directories = false;
+ x->recursive = false;
++ x->secure = false;
+ x->root_dev_ino = NULL;
+ x->stdin_tty = isatty (STDIN_FILENO);
+ x->verbose = false;
+@@ -223,7 +226,7 @@ main (int argc, char **argv)
+ /* Try to disable the ability to unlink a directory. */
+ priv_set_remove_linkdir ();
+
+- while ((c = getopt_long (argc, argv, "dfirvIR", long_opts, NULL)) != -1)
++ while ((c = getopt_long (argc, argv, "dfirsvIR", long_opts, NULL)) != -1)
+ {
+ switch (c)
+ {
+@@ -254,6 +257,10 @@ main (int argc, char **argv)
+ x.recursive = true;
+ break;
+
++ case 's':
++ x.secure = true;
++ break;
++
+ case INTERACTIVE_OPTION:
+ {
+ int i;
+--
+2.1.0
+
--
2.1.0
Simple test, needs the new rm with option -s.
Signed-off-by: Alexander Holler <[email protected]>
---
test_unlinkat_s.sh | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
create mode 100755 test_unlinkat_s.sh
diff --git a/test_unlinkat_s.sh b/test_unlinkat_s.sh
new file mode 100755
index 0000000..f6ba6ac
--- /dev/null
+++ b/test_unlinkat_s.sh
@@ -0,0 +1,27 @@
+#!/bin/sh -e
+
+MOUNTDIR="$(mktemp -d --tmpdir unlinkat_s_mnt.XXXXXXXXXX)"
+TESTIMG="$(mktemp --tmpdir unlinkat_s_img.XXXXXXXXXX)"
+
+dd if=/dev/zero of=$TESTIMG bs=1M count=10
+mkfs.ext4 $TESTIMG
+grep -v -a abrakadabra $TESTIMG >/dev/null
+mount -o loop $TESTIMG $MOUNTDIR
+echo abrakadabra >$MOUNTDIR/foo.txt
+umount $MOUNTDIR
+grep -a abrakadabra $TESTIMG >/dev/null
+mount -o loop $TESTIMG $MOUNTDIR
+rm -s $MOUNTDIR/foo.txt
+umount $MOUNTDIR
+grep -v -a abrakadabra $TESTIMG >/dev/null
+mount -o loop $TESTIMG $MOUNTDIR
+echo abrakadabra >$MOUNTDIR/foo.txt
+umount $MOUNTDIR
+grep -a abrakadabra $TESTIMG >/dev/null
+mount -o loop $TESTIMG $MOUNTDIR
+rm $MOUNTDIR/foo.txt
+umount $MOUNTDIR
+grep -a abrakadabra $TESTIMG >/dev/null
+rm $TESTIMG
+rmdir $MOUNTDIR
+echo "unlinkat_s() worked and unlink() didn't"
--
2.1.0
On Mon, Feb 02, 2015 at 06:05:09PM +0100, Alexander Holler wrote:
> + if (inode) {
> + // TODO:
> + // if (inode is file and 's' flag is set)
> + // secure = true;
> + if (!secure)
> + iput(inode); /* truncate the inode here */
> + else {
> + struct super_block *sb = inode->i_sb;
> + if (sb->s_op->set_secure_delete)
> + sb->s_op->set_secure_delete(sb, true);
> + // TODO: We should fail if secure isn't supported,
> + // look up how that's possible here.
> + iput(inode); /* truncate the inode here */
> + // TODO: check if sb is still valid after the inode is gone
> + sync_filesystem(sb);
> + if (sb->s_op->set_secure_delete)
> + sb->s_op->set_secure_delete(sb, false);
> + }
Charming. Now, what exactly happens if two such syscalls overlap in time?
Moroever, what makes you equate unlink() with inode removal? What happens
if you race e.g. with stat(2) on the same thing? Or if there's an opened
file over that sucker, for that matter?
Am 03.02.2015 um 07:05 schrieb Al Viro:
> On Mon, Feb 02, 2015 at 06:05:09PM +0100, Alexander Holler wrote:
>> + if (inode) {
>> + // TODO:
>> + // if (inode is file and 's' flag is set)
>> + // secure = true;
>> + if (!secure)
>> + iput(inode); /* truncate the inode here */
>> + else {
>> + struct super_block *sb = inode->i_sb;
>> + if (sb->s_op->set_secure_delete)
>> + sb->s_op->set_secure_delete(sb, true);
>> + // TODO: We should fail if secure isn't supported,
>> + // look up how that's possible here.
>> + iput(inode); /* truncate the inode here */
>> + // TODO: check if sb is still valid after the inode is gone
>> + sync_filesystem(sb);
>> + if (sb->s_op->set_secure_delete)
>> + sb->s_op->set_secure_delete(sb, false);
>> + }
>
> Charming. Now, what exactly happens if two such syscalls overlap in time?
What do you think will happen? I assume you haven't looked at how I've
implemented set_secure_delete(). CHarming.
Am 03.02.2015 um 07:05 schrieb Al Viro:
> On Mon, Feb 02, 2015 at 06:05:09PM +0100, Alexander Holler wrote:
>> + if (inode) {
>> + // TODO:
>> + // if (inode is file and 's' flag is set)
>> + // secure = true;
>> + if (!secure)
>> + iput(inode); /* truncate the inode here */
>> + else {
>> + struct super_block *sb = inode->i_sb;
>> + if (sb->s_op->set_secure_delete)
>> + sb->s_op->set_secure_delete(sb, true);
>> + // TODO: We should fail if secure isn't supported,
>> + // look up how that's possible here.
>> + iput(inode); /* truncate the inode here */
>> + // TODO: check if sb is still valid after the inode is gone
>> + sync_filesystem(sb);
>> + if (sb->s_op->set_secure_delete)
>> + sb->s_op->set_secure_delete(sb, false);
>> + }
>
> Charming. Now, what exactly happens if two such syscalls overlap in time?
> Moroever, what makes you equate unlink() with inode removal? What happens
> if you race e.g. with stat(2) on the same thing? Or if there's an opened
> file over that sucker, for that matter?
Sorry, but I first had to make breakfast after I've got angry about the
usual arrogance of most Linux kernel maintainers.
I've already answered the first question.
Now to the second. That still might be a problem. But that's why this is
a RFC, why there is a WIP (Work In Progress) before the patch, why I've
written I've never looked at those sources before, why I've written they
are imperfect and why I've written I have not spend much time on these
patches. I've posted them to show how I think the problem might be
solved. These patches evolved out of desperation that otherwise users
have to wait another 30 years until they will be offered a way to really
delete files.
If the removal is somehow scheduled then, of course, the secure flag has
to be scheduled too.
Alexander Holler
On Tue, Feb 03, 2015 at 07:58:50AM +0100, Alexander Holler wrote:
> > Charming. Now, what exactly happens if two such syscalls overlap in time?
>
> What do you think will happen? I assume you haven't looked at how I've
> implemented set_secure_delete(). CHarming.
AFAICS, you get random unlink() happening at the same time hit by that
mess, whether they'd asked for it or not. What's more, this counter
of yours is *not* guaranteed to be elevated during the final iput() of the
inode you wanted to get - again, ls -lR racing with that syscall can
elevate the refcount of dentry, making d_delete() in vfs_unlink() just
remove that dentry from hash, while keeping it positive. If dentry
reference grabbed by stat(2) is released after both dput() and iput() in
do_unlinkat(), the final iput() will be done when stat(2) drops its
reference to dentry, triggering immediate dentry_kill() (since dentry
has already been unhashed) and dentry_iput() from it.
IOW, this counter is both too crude (it's fs-wide, for crying out loud)
*and* not guaranteed to cover enough. _IF_ you want that behaviour at
all, it ought to be an in-core inode flag set by that syscall and
checked by truncation logics to decide whether to do normal truncate of
this "overwrite with zeroes" thing.
While we are at it, "overwrite with zeroes" is too weak if the attacker
might get hold of the actual hardware. Google for details - it's far too
long story for l-k posting. Look for data recovery and secure data erasure...
On Tue, 2015-02-03 at 07:58 +0100, Alexander Holler wrote:
> Am 03.02.2015 um 07:05 schrieb Al Viro:
> > On Mon, Feb 02, 2015 at 06:05:09PM +0100, Alexander Holler wrote:
> >> + if (inode) {
> >> + // TODO:
> >> + // if (inode is file and 's' flag is set)
> >> + // secure = true;
> >> + if (!secure)
> >> + iput(inode); /* truncate the inode here */
> >> + else {
> >> + struct super_block *sb = inode->i_sb;
> >> + if (sb->s_op->set_secure_delete)
> >> + sb->s_op->set_secure_delete(sb, true);
> >> + // TODO: We should fail if secure isn't supported,
> >> + // look up how that's possible here.
> >> + iput(inode); /* truncate the inode here */
> >> + // TODO: check if sb is still valid after the inode is gone
> >> + sync_filesystem(sb);
> >> + if (sb->s_op->set_secure_delete)
> >> + sb->s_op->set_secure_delete(sb, false);
> >> + }
> >
> > Charming. Now, what exactly happens if two such syscalls overlap in time?
>
> What do you think will happen? I assume you haven't looked at how I've
> implemented set_secure_delete(). CHarming.
Chill, why don't you.
Am 03.02.2015 um 08:56 schrieb Al Viro:
> While we are at it, "overwrite with zeroes" is too weak if the attacker
> might get hold of the actual hardware. Google for details - it's far too
> long story for l-k posting. Look for data recovery and secure data erasure...
You might read
http://link.springer.com/chapter/10.1007/978-3-540-89862-7_21
Here is an article in german about that:
http://www.heise.de/security/meldung/Sicheres-Loeschen-Einmal-ueberschreiben-genuegt-198816.html
In short, it's enough to overwrite it once with zeros,
On Tue, Feb 03, 2015 at 09:01:36AM +0100, Alexander Holler wrote:
> Am 03.02.2015 um 08:56 schrieb Al Viro:
>
> >While we are at it, "overwrite with zeroes" is too weak if the attacker
> >might get hold of the actual hardware. Google for details - it's far too
> >long story for l-k posting. Look for data recovery and secure data erasure...
>
> You might read
>
> http://link.springer.com/chapter/10.1007/978-3-540-89862-7_21
>
> Here is an article in german about that:
>
> http://www.heise.de/security/meldung/Sicheres-Loeschen-Einmal-ueberschreiben-genuegt-198816.html
>
> In short, it's enough to overwrite it once with zeros,
Regardless of the media used? How does that work on e.g. flash?
Am 03.02.2015 um 09:10 schrieb Al Viro:
> On Tue, Feb 03, 2015 at 09:01:36AM +0100, Alexander Holler wrote:
>> Am 03.02.2015 um 08:56 schrieb Al Viro:
>>
>>> While we are at it, "overwrite with zeroes" is too weak if the attacker
>>> might get hold of the actual hardware. Google for details - it's far too
>>> long story for l-k posting. Look for data recovery and secure data erasure...
>>
>> You might read
>>
>> http://link.springer.com/chapter/10.1007/978-3-540-89862-7_21
>>
>> Here is an article in german about that:
>>
>> http://www.heise.de/security/meldung/Sicheres-Loeschen-Einmal-ueberschreiben-genuegt-198816.html
>>
>> In short, it's enough to overwrite it once with zeros,
>
> Regardless of the media used? How does that work on e.g. flash?
That's why "secure trim" should be used if available. Blame the storage
people for not offering it. But as I've already mentioned, they would
just answer that filesystems don't (didn't) delete files anyway.
Am 03.02.2015 um 08:56 schrieb Al Viro:
> On Tue, Feb 03, 2015 at 07:58:50AM +0100, Alexander Holler wrote:
>
>>> Charming. Now, what exactly happens if two such syscalls overlap in time?
>>
>> What do you think will happen? I assume you haven't looked at how I've
>> implemented set_secure_delete(). CHarming.
>
> AFAICS, you get random unlink() happening at the same time hit by that
> mess, whether they'd asked for it or not. What's more, this counter
> of yours is *not* guaranteed to be elevated during the final iput() of the
> inode you wanted to get - again, ls -lR racing with that syscall can
> elevate the refcount of dentry, making d_delete() in vfs_unlink() just
> remove that dentry from hash, while keeping it positive. If dentry
> reference grabbed by stat(2) is released after both dput() and iput() in
> do_unlinkat(), the final iput() will be done when stat(2) drops its
> reference to dentry, triggering immediate dentry_kill() (since dentry
> has already been unhashed) and dentry_iput() from it.
Thanks for the short explanation. I will see if I can make sense out of
it for me to get an idea how to solve that.
>
> IOW, this counter is both too crude (it's fs-wide, for crying out loud)
> *and* not guaranteed to cover enough. _IF_ you want that behaviour at
Sure it is crude.
But it keeps the patches simple. As I've written, unlinkat_s() isn't
meant for everyday usage, just for the rare case when one really wants
to get rid of some contents. Therefor execution speed or an i/o slowdown
while the "secure deletion" is in work is totally ignored
And that "rare case" doesn't include military security levels, it's just
meant for ordinary people which want make it much, much harder for other
ordinary people (or geeks or kernel maintainers) to read the deleted
content ever again. It's far too easy to use grep or something similiar
to find seemingly deleted stuff at device level again (after it was
deleted by what filesystems are offering nowadays). Especially if one
thinks at stuff like certificates and similiar which can be identified
by common patterns (bit sequences) they use.
Alexander Holler
Am 03.02.2015 um 09:51 schrieb Alexander Holler:
> Am 03.02.2015 um 08:56 schrieb Al Viro:
>> On Tue, Feb 03, 2015 at 07:58:50AM +0100, Alexander Holler wrote:
>>
>>>> Charming. Now, what exactly happens if two such syscalls overlap in
>>>> time?
>>>
>>> What do you think will happen? I assume you haven't looked at how I've
>>> implemented set_secure_delete(). CHarming.
>>
>> AFAICS, you get random unlink() happening at the same time hit by that
>> mess, whether they'd asked for it or not. What's more, this counter
>> of yours is *not* guaranteed to be elevated during the final iput() of
>> the
>> inode you wanted to get - again, ls -lR racing with that syscall can
>> elevate the refcount of dentry, making d_delete() in vfs_unlink() just
>> remove that dentry from hash, while keeping it positive. If dentry
>> reference grabbed by stat(2) is released after both dput() and iput() in
>> do_unlinkat(), the final iput() will be done when stat(2) drops its
>> reference to dentry, triggering immediate dentry_kill() (since dentry
>> has already been unhashed) and dentry_iput() from it.
>
> Thanks for the short explanation. I will see if I can make sense out of
> it for me to get an idea how to solve that.
>
>>
>> IOW, this counter is both too crude (it's fs-wide, for crying out loud)
>> *and* not guaranteed to cover enough. _IF_ you want that behaviour at
>
> Sure it is crude.
>
> But it keeps the patches simple. As I've written, unlinkat_s() isn't
> meant for everyday usage, just for the rare case when one really wants
> to get rid of some contents. Therefor execution speed or an i/o slowdown
> while the "secure deletion" is in work is totally ignored
>
> And that "rare case" doesn't include military security levels, it's just
> meant for ordinary people which want make it much, much harder for other
> ordinary people (or geeks or kernel maintainers) to read the deleted
> content ever again. It's far too easy to use grep or something similiar
> to find seemingly deleted stuff at device level again (after it was
> deleted by what filesystems are offering nowadays). Especially if one
> thinks at stuff like certificates and similiar which can be identified
> by common patterns (bit sequences) they use.
Or to give another more common example: If you delete your contact list,
I likely might find again by just searching for 0x6f726956 at the device
level (assuming you've stored a contact in that list with the same
surname as yours.
And, because I've only mentioned in a different thread, now think at the
problem that nowadays storage is often fixed (soldered) to devices which
don't offer a way to delete the whole storage. You might have luck if
the contact list in question was stored in some encrypted part, but that
presumes that the key for that encrypted part isn't somehow stored on
the same device too. Which unfortunately isn't always the case (maybe
because of usability). And ...
That's why I think filesystems should offer a way to really delete
files. Most people would be happy, even if filesystems won't delete
stuff at military security levels and would disregard all the cases when
they couldn't make sure that stuff is really deleted.
To conclude, most people would be already happy if the most trivial case
would be handled right and not just by marking files as deleted but
leaving the contents intact.
Alexander Holler
Am 03.02.2015 um 10:23 schrieb Alexander Holler:
> Or to give another more common example: If you delete your contact list,
> I likely might find again by just searching for 0x6f726956 at the device
> level (assuming you've stored a contact in that list with the same
> surname as yours.
>
> And, because I've only mentioned in a different thread, now think at the
> problem that nowadays storage is often fixed (soldered) to devices which
> don't offer a way to delete the whole storage. You might have luck if
> the contact list in question was stored in some encrypted part, but that
> presumes that the key for that encrypted part isn't somehow stored on
> the same device too. Which unfortunately isn't always the case (maybe
> because of usability). And ...
>
> That's why I think filesystems should offer a way to really delete
> files. Most people would be happy, even if filesystems won't delete
> stuff at military security levels and would disregard all the cases when
> they couldn't make sure that stuff is really deleted.
>
> To conclude, most people would be already happy if the most trivial case
> would be handled right and not just by marking files as deleted but
> leaving the contents intact.
Maybe I should rename me to Quijote, switch back to pen and paper and
should start to raise carrier pigeons. ;)
E.g. my parents are stull successfully using contact lists on paper.
These are still more readable, easier to handle and smaller than any
available electronic replacement. And they have absolutely no problem to
destroy an old one when they replace it with a new one.
Regards,
Alexander Holler
Am 03.02.2015 um 13:48 schrieb Alexander Holler:
> E.g. my parents are stull successfully using contact lists on paper.
> These are still more readable, easier to handle and smaller than any
> available electronic replacement. And they have absolutely no problem to
> destroy an old one when they replace it with a new one.
Which very seldom happens, no screen which can crack and no battery. ;)
On Mon, 2 Feb 2015, Alexander Holler wrote:
> Date: Mon, 2 Feb 2015 18:05:11 +0100
> From: Alexander Holler <[email protected]>
> To: [email protected]
> Cc: [email protected], Alexander Holler <[email protected]>
> Subject: [PATCH 3/5] WIP: fs: ext4: support unlinkat_s() for secure deletion
> of files
Hi,
I am missing a description, where you'd describe what is this all
about, why and how.
>
> Signed-off-by: Alexander Holler <[email protected]>
> ---
> fs/ext4/ext4.h | 2 ++
> fs/ext4/mballoc.c | 25 +++++++++++++++++++++++--
> fs/ext4/super.c | 12 ++++++++++++
> 3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index c55a1fa..e66507c 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1342,6 +1342,8 @@ struct ext4_sb_info {
> struct ratelimit_state s_err_ratelimit_state;
> struct ratelimit_state s_warning_ratelimit_state;
> struct ratelimit_state s_msg_ratelimit_state;
> +
> + atomic_t secure_delete; /* delete blocks securely? */
> };
>
> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index dbfe15c..f33416f 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2756,6 +2756,19 @@ static inline int ext4_issue_discard(struct super_block *sb,
> return sb_issue_discard(sb, discard_block, count, GFP_NOFS, 0);
> }
>
> +static inline int ext4_issue_zeroout(struct super_block *sb,
> + ext4_group_t block_group, ext4_grpblk_t cluster, int count)
> +{
> + ext4_fsblk_t discard_block;
> +
> + discard_block = (EXT4_C2B(EXT4_SB(sb), cluster) +
> + ext4_group_first_block_no(sb, block_group));
> + count = EXT4_C2B(EXT4_SB(sb), count);
> + //trace_ext4_discard_blocks(sb,
> + // (unsigned long long) discard_block, count);
> + return sb_issue_zeroout(sb, discard_block, count, GFP_NOFS);
> +}
> +
> /*
> * This function is called by the jbd2 layer once the commit has finished,
> * so we know we can free the blocks that were released with that commit.
> @@ -2764,6 +2777,7 @@ static void ext4_free_data_callback(struct super_block *sb,
> struct ext4_journal_cb_entry *jce,
> int rc)
> {
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> struct ext4_free_data *entry = (struct ext4_free_data *)jce;
> struct ext4_buddy e4b;
> struct ext4_group_info *db;
> @@ -2772,6 +2786,11 @@ static void ext4_free_data_callback(struct super_block *sb,
> mb_debug(1, "gonna free %u blocks in group %u (0x%p):",
> entry->efd_count, entry->efd_group, entry);
>
> +
> + // TODO:
> + // if (atomic_read(&sbi->secure_delete) && secure_trim_available)
> + // use secure trim
> + // else
I am missing very important pieces like, what happens if we require
secure delete, but there is no secure trim available and we're still
on the ssd ?
What if the underlying storage is thinly provisioned ?
What if the underlying storage consist of hardware which does
support secure discard and one that does not ? The request crossing
the borders will fail.
What if the underlying hardware does support secure trim, but the
storage under the fs is in raid configuration, which brings me to
the next question.
Discard/secure discard does have a granularity and alignment, so
what if the extent is smaller than a discard granularity, or it is
not aligned properly ? Such discard requests would be ignored.
> if (test_opt(sb, DISCARD)) {
> err = ext4_issue_discard(sb, entry->efd_group,
> entry->efd_start_cluster,
> @@ -2782,8 +2801,10 @@ static void ext4_free_data_callback(struct super_block *sb,
> " with %d", entry->efd_group,
> entry->efd_start_cluster,
> entry->efd_count, err);
> - }
> -
> + } else if (atomic_read(&sbi->secure_delete))
> + ext4_issue_zeroout(sb, entry->efd_group,
> + entry->efd_start_cluster,
> + entry->efd_count);
Error handling is missing here. Also I am not sure that zeroing out
the blocks is really enough. Yes, I've seen the link you've posted,
but I am not convinced.
Did you consider metadata information for the file ? File name,
timestamps, size, data placement ? Is it something you want to
remove as well, or are you going to ignore it ? It can potentially
contain valuable information for the attacker as well. I am just
trying to understand the scope of this thing.
Moreover with inline data you might have the data in the inode
itself, which also means the it will be in the journal as well.
Also with data=journal the data will be in the journal.
With no journal this would not work at all, you have to make this
for nojournal case as well.
What if you do defragmentation in the file, in that case the file data
could be all over the place.
What if you're device is not a real hardware, but just let's say a
loop device ? Talking about the smart phones I had Samsung phone
with that setup (not sure anyone is doing that anymore).
With all that said, the devil is in the details and since it's
security feature the details and corner cases is what you need
to focus on. We have '-o discard' mount option for years now and
we could have made 'secure delete' by simply calling
sb_issue_discard() with BLKDEV_DISCARD_SECURE flag, but that's not
really enough.
Not mentioning the unreliable hardware. And I am not going to rely
on the hardware which was not designed with security in mind for my
security feature, no one should. It's much better, easies and more
feasible just to use disk encryption - it also comes with advantages
that no one can actually read your existing files as opposed to just
deleted files.
> err = ext4_mb_load_buddy(sb, entry->efd_group, &e4b);
> /* we expect to find existing buddy because it's pinned */
> BUG_ON(err != 0);
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 2c9e686..f87e3ff 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1100,6 +1100,17 @@ static const struct quotactl_ops ext4_qctl_sysfile_operations = {
> };
> #endif
>
> +static void ext4_set_secure_delete(struct super_block *sb, bool secure)
> +{
> + struct ext4_sb_info *sbi = EXT4_SB(sb);
> + // TODO: will overflow with a very large number of
> + // concurrent calls of unlinkat_s().
> + if (secure)
> + atomic_inc(&sbi->secure_delete);
> + else
> + atomic_dec(&sbi->secure_delete);
> +}
> +
> static const struct super_operations ext4_sops = {
> .alloc_inode = ext4_alloc_inode,
> .destroy_inode = ext4_destroy_inode,
> @@ -1119,6 +1130,7 @@ static const struct super_operations ext4_sops = {
> .quota_write = ext4_quota_write,
> #endif
> .bdev_try_to_free_page = bdev_try_to_free_page,
> + .set_secure_delete = ext4_set_secure_delete,
> };
>
> static const struct export_operations ext4_export_ops = {
>
Am 03.02.2015 um 14:50 schrieb Luk?? Czerner:
> On Mon, 2 Feb 2015, Alexander Holler wrote:
>
>> Date: Mon, 2 Feb 2015 18:05:11 +0100
>> From: Alexander Holler <[email protected]>
>> To: [email protected]
>> Cc: [email protected], Alexander Holler <[email protected]>
>> Subject: [PATCH 3/5] WIP: fs: ext4: support unlinkat_s() for secure deletion
>> of files
>
> Hi,
>
> I am missing a description, where you'd describe what is this all
> about, why and how.
Maybe you've missed the introduction, patch 0/5.
Sorry, but I'm not sponsored and the time I can spend is limited.
Therefor, please, don't expect a paper in the kind european
bureaucrazies are producing.
I'm already spending a lot of time trying to convince the developers
here, that this a feature most people expect from any filesystem. And
I've written these patches, for which now, even after I've marked them
with all kind of "preliminary" terms, still get blamed.
And, unfortunately, myths like that overwriting a block once on
traditional magnetic platters isn't enough, don't help either.
> I am missing very important pieces like, what happens if we require
> secure delete, but there is no secure trim available and we're still
> on the ssd ?
As written in 0/5, I don't know if trim (without secure) might be enough.
>
> What if the underlying storage is thinly provisioned ?
>
> What if the underlying storage consist of hardware which does
> support secure discard and one that does not ? The request crossing
> the borders will fail.
>
> What if the underlying hardware does support secure trim, but the
> storage under the fs is in raid configuration, which brings me to
> the next question.
That's all about how unlinkat_s will be documented. I would suggest to
let unlinkat_s() fail if it is sure it can't delete stuff, but otherwise
would write in the documentation that it might be useless in many cases
like stacked filesystems, mixed raids and similiar constructs. Maybe the
documentation for shred is something which could be used as an template.
>
> Discard/secure discard does have a granularity and alignment, so
> what if the extent is smaller than a discard granularity, or it is
> not aligned properly ? Such discard requests would be ignored.
You can throw in another dozen complications. That's just another way to
say "never", to kill any further user expectations or requests and to
hide the forest behind trees.
I wonder how you ever solve problems if you never start with solving
even the most trivial case, always getting lost in an uncountbale number
of problems.
> Error handling is missing here. Also I am not sure that zeroing out
> the blocks is really enough. Yes, I've seen the link you've posted,
> but I am not convinced.
Implementing a sb_issue_zeroout_30_times() should be trivial. You could
even make that an mount option, if that would convince you. But besides
that, I've never heared of any case where someone has read anything back
which was overwritten just once. But in contrast, there are countless
case where stuff was read back because the filesystem didn't really
delete it.
>
> Did you consider metadata information for the file ? File name,
> timestamps, size, data placement ? Is it something you want to
> remove as well, or are you going to ignore it ? It can potentially
> contain valuable information for the attacker as well. I am just
> trying to understand the scope of this thing.
I prefer to start with simple steps to cover a least the most trivial
cases which already would make 99% percent of users happy. You can
always find some cases when it doesn't work and you could always make
unlinkat_s() more complicated.
I'm aware of all the other stuff you are mentioning below, but I'll now
stop arguing further. Sorry, I've already expected all these response,
but at least, I've tried it in the hope someone else might still see the
forest behind all those trees.
Maybe I should request removal of shred from Fedora/RH instead.
According to you it's one of the most misleading and useless tools. So
why still confuse people with it and still ship it?
Have a nice day, week or year ...
Regards,
Alexander Holler
>
> Moreover with inline data you might have the data in the inode
> itself, which also means the it will be in the journal as well.
>
> Also with data=journal the data will be in the journal.
>
> With no journal this would not work at all, you have to make this
> for nojournal case as well.
>
> What if you do defragmentation in the file, in that case the file data
> could be all over the place.
>
> What if you're device is not a real hardware, but just let's say a
> loop device ? Talking about the smart phones I had Samsung phone
> with that setup (not sure anyone is doing that anymore).
>
>
> With all that said, the devil is in the details and since it's
> security feature the details and corner cases is what you need
> to focus on. We have '-o discard' mount option for years now and
> we could have made 'secure delete' by simply calling
> sb_issue_discard() with BLKDEV_DISCARD_SECURE flag, but that's not
> really enough.
>
> Not mentioning the unreliable hardware. And I am not going to rely
> on the hardware which was not designed with security in mind for my
> security feature, no one should. It's much better, easies and more
> feasible just to use disk encryption - it also comes with advantages
> that no one can actually read your existing files as opposed to just
> deleted files.
>
>> err = ext4_mb_load_buddy(sb, entry->efd_group, &e4b);
>> /* we expect to find existing buddy because it's pinned */
>> BUG_ON(err != 0);
>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>> index 2c9e686..f87e3ff 100644
>> --- a/fs/ext4/super.c
>> +++ b/fs/ext4/super.c
>> @@ -1100,6 +1100,17 @@ static const struct quotactl_ops ext4_qctl_sysfile_operations = {
>> };
>> #endif
>>
>> +static void ext4_set_secure_delete(struct super_block *sb, bool secure)
>> +{
>> + struct ext4_sb_info *sbi = EXT4_SB(sb);
>> + // TODO: will overflow with a very large number of
>> + // concurrent calls of unlinkat_s().
>> + if (secure)
>> + atomic_inc(&sbi->secure_delete);
>> + else
>> + atomic_dec(&sbi->secure_delete);
>> +}
>> +
>> static const struct super_operations ext4_sops = {
>> .alloc_inode = ext4_alloc_inode,
>> .destroy_inode = ext4_destroy_inode,
>> @@ -1119,6 +1130,7 @@ static const struct super_operations ext4_sops = {
>> .quota_write = ext4_quota_write,
>> #endif
>> .bdev_try_to_free_page = bdev_try_to_free_page,
>> + .set_secure_delete = ext4_set_secure_delete,
>> };
>>
>> static const struct export_operations ext4_export_ops = {
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Am 03.02.2015 um 15:50 schrieb Alexander Holler:
> Maybe I should request removal of shred from Fedora/RH instead.
> According to you it's one of the most misleading and useless tools. So
> why still confuse people with it and still ship it?
At least it should be documented that it doesn't delete the files in
question from backups, like you should never try to dry your cat in a
microwave.
>
> Have a nice day, week or year ...
>
> Regards,
>
> Alexander Holler
> What's the answer? Easy and obvious, just (try to) overwrite the contents
> of a file by request from userspace. Filesystems do know where on the
> storage they have written the contents to, so why not just let them delete
> that stuff themself instead? It's almost unbelievable that this was not
> already done in the past 30 years.
Easy, obvious and wrong 8)
The last PC hard disks that were defined to do what you told them where
ST-506 MFM and RLL devices. IDE disks are basically 'disk emulators',
SSDs vastly more so.
An IDE disk can do what it likes with your I/O so long as your requests
and returns are what the standard expects. So for example if you zero a
sector its perfectly entitled to set a bit in a master index of zeroed
sectors. You can't tell the difference and externally it looks like
an ST506 disc with extensions. Even simple devices may well move blocks
around to deal with bad blocks, or high usage spots to avoid having to
keep rewriting the tracks either side.
An SSD internally has minimal relationship to a disc. If you have the
tools to write a file, write over it, discard it and then dump the flash
chips you'll probably find it's still there.
If you plug a Raspberry Pi into a modern large hard disk, the chances are
the smarter end of the cable is the disk.
Alan
Am 03.02.2015 um 16:13 schrieb Alexander Holler:
> Am 03.02.2015 um 15:50 schrieb Alexander Holler:
>
>> Maybe I should request removal of shred from Fedora/RH instead.
>> According to you it's one of the most misleading and useless tools. So
>> why still confuse people with it and still ship it?
>
> At least it should be documented that it doesn't delete the files in
> question from backups, like you should never try to dry your cat in a
> microwave.
Which, by the way, should be documented in the manpages for rm, unlink()
and unlinkat() too. Someone might otherwise assume that e.g. a rm might
delete files in a snapshot too,
On Tue, 3 Feb 2015, Alexander Holler wrote:
> Date: Tue, 03 Feb 2015 15:50:53 +0100
> From: Alexander Holler <[email protected]>
> To: Luk?? Czerner <[email protected]>
> Cc: [email protected], [email protected]
> Subject: Re: [PATCH 3/5] WIP: fs: ext4: support unlinkat_s() for secure
> deletion of files
>
> Am 03.02.2015 um 14:50 schrieb Luk?? Czerner:
> > On Mon, 2 Feb 2015, Alexander Holler wrote:
> >
> > > Date: Mon, 2 Feb 2015 18:05:11 +0100
> > > From: Alexander Holler <[email protected]>
> > > To: [email protected]
> > > Cc: [email protected], Alexander Holler <[email protected]>
> > > Subject: [PATCH 3/5] WIP: fs: ext4: support unlinkat_s() for secure
> > > deletion
> > > of files
> >
> > Hi,
> >
> > I am missing a description, where you'd describe what is this all
> > about, why and how.
>
> Maybe you've missed the introduction, patch 0/5.
>
> Sorry, but I'm not sponsored and the time I can spend is limited. Therefor,
> please, don't expect a paper in the kind european bureaucrazies are producing.
Please read Documentation/SubmittingPatches. What I am asking is a
description of this patch.
>
> I'm already spending a lot of time trying to convince the developers here,
> that this a feature most people expect from any filesystem. And I've written
> these patches, for which now, even after I've marked them with all kind of
> "preliminary" terms, still get blamed.
So, you'd be much happier if we just ignored your patches ? I am not
sure you understand how this works. You spend time creating and
posting patches and at least two people (including me) spent time
reading and commenting on it, isn't that what you need ?
You have the attention to move this forward, so please take
advantage of this.
>
> And, unfortunately, myths like that overwriting a block once on traditional
> magnetic platters isn't enough, don't help either.
Not sure about the myths, I've read the paper here
http://digital-forensics.sans.org/blog/2009/01/15/overwriting-hard-drive-data/
and it sounds good, but I have no idea about the methodology, the
details of it. Specifications of the HDD used. For example he
mention writing 512B, but on the newer (4k physical sector) drives at that
time it might have meant rewriting on 4k block twice just to write 1k. This
could also explain the difference between the 'old' and 'new' drives
(whatever that means).
So what I am saying is that I am not entirely convinced that what is
concluded in that paper is true.
>
>
> > I am missing very important pieces like, what happens if we require
> > secure delete, but there is no secure trim available and we're still
> > on the ssd ?
>
> As written in 0/5, I don't know if trim (without secure) might be enough.
Well, then you should find out since you're the one writing the
patches. My point is that this case has to be taken care of in the
code.
>
> >
> > What if the underlying storage is thinly provisioned ?
> >
> > What if the underlying storage consist of hardware which does
> > support secure discard and one that does not ? The request crossing
> > the borders will fail.
> >
> > What if the underlying hardware does support secure trim, but the
> > storage under the fs is in raid configuration, which brings me to
> > the next question.
>
> That's all about how unlinkat_s will be documented. I would suggest to let
> unlinkat_s() fail if it is sure it can't delete stuff, but otherwise would
> write in the documentation that it might be useless in many cases like stacked
> filesystems, mixed raids and similiar constructs. Maybe the documentation for
> shred is something which could be used as an template.
Well, that's problematic. Documentation is not really enough, you
can't just expect every phone user to read unlinkat_s documentation.
Once you try to delete the file securely and it fails, you have a
problem. Because you might no longer have the references to the
blocks used by that file. That's why I think that error handling is
important here. You have to give a user the way out so he can use
other means of getting rid of the file.
>
> >
> > Discard/secure discard does have a granularity and alignment, so
> > what if the extent is smaller than a discard granularity, or it is
> > not aligned properly ? Such discard requests would be ignored.
>
> You can throw in another dozen complications. That's just another way to say
> "never", to kill any further user expectations or requests and to hide the
> forest behind trees.
That's not what your code does.
>
> I wonder how you ever solve problems if you never start with solving even the
> most trivial case, always getting lost in an uncountbale number of problems.
How ? By listening to feedback and discussing, or implementing
missing parts. What this patch does in it's current version is
implementing something that we already know, while ignoring all the
important questions.
>
> > Error handling is missing here. Also I am not sure that zeroing out
> > the blocks is really enough. Yes, I've seen the link you've posted,
> > but I am not convinced.
>
> Implementing a sb_issue_zeroout_30_times() should be trivial. You could even
> make that an mount option, if that would convince you. But besides that, I've
> never heared of any case where someone has read anything back which was
> overwritten just once. But in contrast, there are countless case where stuff
> was read back because the filesystem didn't really delete it.
Right, but reading deleted data is expected. We never pretend that
this was not the case. However when you want to make it disappear,
you have to be sure it work.
>
> >
> > Did you consider metadata information for the file ? File name,
> > timestamps, size, data placement ? Is it something you want to
> > remove as well, or are you going to ignore it ? It can potentially
> > contain valuable information for the attacker as well. I am just
> > trying to understand the scope of this thing.
>
> I prefer to start with simple steps to cover a least the most trivial cases
> which already would make 99% percent of users happy. You can always find some
> cases when it doesn't work and you could always make unlinkat_s() more
> complicated.
So, you are or are not considering dealing with metadata information ?
It's not really clear from your answer.
>
> I'm aware of all the other stuff you are mentioning below, but I'll now stop
> arguing further. Sorry, I've already expected all these response, but at
> least, I've tried it in the hope someone else might still see the forest
> behind all those trees.
You might be aware, but I could not tell that from the code and the
lack of description of those cases and why you've ignored it.
Best regards,
-Lukas
>
> Maybe I should request removal of shred from Fedora/RH instead. According to
> you it's one of the most misleading and useless tools. So why still confuse
> people with it and still ship it?
>
> Have a nice day, week or year ...
>
> Regards,
>
> Alexander Holler
>
> >
> > Moreover with inline data you might have the data in the inode
> > itself, which also means the it will be in the journal as well.
> >
> > Also with data=journal the data will be in the journal.
> >
> > With no journal this would not work at all, you have to make this
> > for nojournal case as well.
> >
> > What if you do defragmentation in the file, in that case the file data
> > could be all over the place.
> >
> > What if you're device is not a real hardware, but just let's say a
> > loop device ? Talking about the smart phones I had Samsung phone
> > with that setup (not sure anyone is doing that anymore).
> >
> >
> > With all that said, the devil is in the details and since it's
> > security feature the details and corner cases is what you need
> > to focus on. We have '-o discard' mount option for years now and
> > we could have made 'secure delete' by simply calling
> > sb_issue_discard() with BLKDEV_DISCARD_SECURE flag, but that's not
> > really enough.
> >
> > Not mentioning the unreliable hardware. And I am not going to rely
> > on the hardware which was not designed with security in mind for my
> > security feature, no one should. It's much better, easies and more
> > feasible just to use disk encryption - it also comes with advantages
> > that no one can actually read your existing files as opposed to just
> > deleted files.
> >
> > > err = ext4_mb_load_buddy(sb, entry->efd_group, &e4b);
> > > /* we expect to find existing buddy because it's pinned */
> > > BUG_ON(err != 0);
> > > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > > index 2c9e686..f87e3ff 100644
> > > --- a/fs/ext4/super.c
> > > +++ b/fs/ext4/super.c
> > > @@ -1100,6 +1100,17 @@ static const struct quotactl_ops
> > > ext4_qctl_sysfile_operations = {
> > > };
> > > #endif
> > >
> > > +static void ext4_set_secure_delete(struct super_block *sb, bool secure)
> > > +{
> > > + struct ext4_sb_info *sbi = EXT4_SB(sb);
> > > + // TODO: will overflow with a very large number of
> > > + // concurrent calls of unlinkat_s().
> > > + if (secure)
> > > + atomic_inc(&sbi->secure_delete);
> > > + else
> > > + atomic_dec(&sbi->secure_delete);
> > > +}
> > > +
> > > static const struct super_operations ext4_sops = {
> > > .alloc_inode = ext4_alloc_inode,
> > > .destroy_inode = ext4_destroy_inode,
> > > @@ -1119,6 +1130,7 @@ static const struct super_operations ext4_sops = {
> > > .quota_write = ext4_quota_write,
> > > #endif
> > > .bdev_try_to_free_page = bdev_try_to_free_page,
> > > + .set_secure_delete = ext4_set_secure_delete,
> > > };
> > >
> > > static const struct export_operations ext4_export_ops = {
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
>
Am 03.02.2015 um 16:15 schrieb One Thousand Gnomes:
>> What's the answer? Easy and obvious, just (try to) overwrite the contents
>> of a file by request from userspace. Filesystems do know where on the
>> storage they have written the contents to, so why not just let them delete
>> that stuff themself instead? It's almost unbelievable that this was not
>> already done in the past 30 years.
>
> Easy, obvious and wrong 8)
>
> The last PC hard disks that were defined to do what you told them where
> ST-506 MFM and RLL devices. IDE disks are basically 'disk emulators',
> SSDs vastly more so.
>
> An IDE disk can do what it likes with your I/O so long as your requests
> and returns are what the standard expects. So for example if you zero a
> sector its perfectly entitled to set a bit in a master index of zeroed
> sectors. You can't tell the difference and externally it looks like
> an ST506 disc with extensions. Even simple devices may well move blocks
> around to deal with bad blocks, or high usage spots to avoid having to
> keep rewriting the tracks either side.
>
> An SSD internally has minimal relationship to a disc. If you have the
> tools to write a file, write over it, discard it and then dump the flash
> chips you'll probably find it's still there.
>
> If you plug a Raspberry Pi into a modern large hard disk, the chances are
> the smarter end of the cable is the disk.
Thanks for the explanations, also I was aware of all that. But I still
hope some human sense on this list is still left and repeat it again:
This feature isn't about military security.
This feature isn't about military security.
This feature isn't about military security.
This feature isn't about military security.
It's meant to make it impossible for most (ordinary) people to recover
deleted contents. Including most black hats. Of course there might be
some governments, companies or similiar organizations with are able to
spend an unbelievable amount of resources to recover such stuff. Maybe
even some mentalist might be able to recover it. What do I know? But I
and most other people don't care for these use cases. They would be
happy if at least the most trivial cases to recover deleted contents
could be avoided.
Regards,
Alexander Holler
Am 03.02.2015 um 16:41 schrieb Lukáš Czerner:
I'll give up.
Am 03.02.2015 um 16:41 schrieb Lukáš Czerner:
> On Tue, 3 Feb 2015, Alexander Holler wrote:
>> I'm already spending a lot of time trying to convince the developers here,
>> that this a feature most people expect from any filesystem. And I've written
>> these patches, for which now, even after I've marked them with all kind of
>> "preliminary" terms, still get blamed.
>
> So, you'd be much happier if we just ignored your patches ? I am not
> sure you understand how this works. You spend time creating and
> posting patches and at least two people (including me) spent time
> reading and commenting on it, isn't that what you need ?
Also I've already given up, I feel the need to still answer at least that:
After thinking a bit about the problem I had the idea that a FS-wide
switch like the one I've implemented might be a very easy way to
implement something which would be enough for my needs (and maybe the
needs of someone else). Then I've looked at the FAT sources because I
know a bit about FAT and it seemed to me like one of the more easier
places to test my aproach.
And it was. After around 3 hours I had something working which wiped
files from FAT by overwriting them. Then I've spend 1-2 hours to add the
syscall and add -s to rm and some days later and another 3 hours I had
the same for working for ext4. That already almost covers all my needs,
maybe I'll spend another few hours to look if it's that easy with BTRFS too.
So, because I've already had filed bugs #92261 and #92271 in the kernels
bugzilla, I've posted that stuff. Nothing more nothing less.
> You have the attention to move this forward, so please take
> advantage of this.
Sorry, but I'm unable to spend all the necessary time to make this
perfect in regard to whatever maintainers requesting. I'll posted these
patches more as some thought-provoking impulse (proof of concept), and I
failed obviously badly in marking them as such.
I'm sorry if I've wasted your time.
Regards,
Alexander Holler
On Tue, Feb 03, 2015 at 01:48:56PM +0100, Alexander Holler wrote:
>
> E.g. my parents are stull successfully using contact lists on paper. These
> are still more readable, easier to handle and smaller than any available
> electronic replacement. And they have absolutely no problem to destroy an
> old one when they replace it with a new one.
Sort of using crypto, which I think is still the best response to this
particular use case --- the trick is making it easy to use, but that's
a desktop/distro integration problem --- the most secure way to really
make sure information on paper and on an SSD is secure is actually the
same --- you use a shredder.
And this isn't just for "military grade security". There are some
commercial cloud providers which do precisely this, because they know
that their customer's security and their reputation is not easily
valued, and certainly the cost of some piddling flash chips, after
they have been depreciated, is defintiely cheaper than a security
exposure. Look at the estimates of how much money Target lost with
their little security oopsie. Again, this is commercial, not military
security.
Cheers,
- Ted
Am 03.02.2015 um 18:48 schrieb Theodore Ts'o:
> On Tue, Feb 03, 2015 at 01:48:56PM +0100, Alexander Holler wrote:
>>
>> E.g. my parents are stull successfully using contact lists on paper. These
>> are still more readable, easier to handle and smaller than any available
>> electronic replacement. And they have absolutely no problem to destroy an
>> old one when they replace it with a new one.
>
> Sort of using crypto, which I think is still the best response to this
> particular use case --- the trick is making it easy to use, but that's
> a desktop/distro integration problem --- the most secure way to really
> make sure information on paper and on an SSD is secure is actually the
> same --- you use a shredder.
>
> And this isn't just for "military grade security". There are some
> commercial cloud providers which do precisely this, because they know
> that their customer's security and their reputation is not easily
> valued, and certainly the cost of some piddling flash chips, after
> they have been depreciated, is defintiely cheaper than a security
> exposure. Look at the estimates of how much money Target lost with
> their little security oopsie. Again, this is commercial, not military
> security.
Yeah, as I've already admitted in the bug, I never should have use the
word secure, because everyone nowadays seems to end up in panic when
reading that word.
So, if I would be able to use sed on my mails, I would replace
unlinkat_s() with unlinkat_w() (for wipe) or would say that _s does
stand for 'shred' in the means of shred(1).
Anyways, as I hopefully still have some years left to live (also I'm
writing SW since 30a, I'm not that old as some of you guys and still
have to work at least two decades), I might be able to see where we will
end up with all that fiasco we've already managed to drive into.
Regards,
Alexander Holler
BTW.,
if someone still likes my trivial approach and thinks the patches might
be usable because they contain just a few silly changes which might be
easily rebased onto future kernel versions, I suggest to get rid of the
new syscall and just use something like
#define AT_WIPE 0x2000
in include/uapi/linux/fcntl.h
for using the old unlinkat() with that new flag. That's much easier to
handle than a new syscall which likely never will end up in the kernel
and makes my silly patches even more simple.
Regards,
Alexander Holler
On Tue, Feb 03, 2015 at 07:01:50PM +0100, Alexander Holler wrote:
> Yeah, as I've already admitted in the bug, I never should have use
> the word secure, because everyone nowadays seems to end up in panic
> when reading that word.
>
> So, if I would be able to use sed on my mails, I would replace
> unlinkat_s() with unlinkat_w() (for wipe) or would say that _s does
> stand for 'shred' in the means of shred(1).
TBH, I suspect that the saner API would be something like EXT2_IOC_[SG[ETFLAGS,
allowing to set and query that along with other flags (append-only, etc.).
Forget about unlink; first of all, whatever API you use should only _mark_
the inode as "zero freed blocks" (or trim, for that matter). You can't
force freeing of an inode, so either you make sure that subsequent freeing
of inode, whenever it happens, will do that work, or your API is hopelessly
racy. Moreover, when link has been removed it's too late to report that
fs has no way to e.g. trim those blocks, so you really want to have it done
_before_ the actual link removal. And if the file contents is that sensitive,
you'd better extend the same protection to all operations that free its
blocks, including truncate(), fallocate() hole-punching, whatever. What's
more, if you divorce that from link removal, you probably don't want it as
in-core-only flag - have it stored in inode, if fs supports that.
Alternatively, you might want to represent it as xattr - as much as I hate
those, it might turn out to be the best fit in this case, if we end up
with several variants for freed blocks disposal. Not sure...
But whichever way we represent that state, IMO
a) operation should be similar to chmod/chattr/setfattr - modifying
inode metadata.
b) it should affect _all_ operations freeing blocks of that file
from that point on
c) it should be able to fail, telling you that you can't do that for
this backing store.
Al Viro wrote:
> On Tue, Feb 03, 2015 at 07:01:50PM +0100, Alexander Holler wrote:
>
>> Yeah, as I've already admitted in the bug, I never should have use
>> the word secure, because everyone nowadays seems to end up in panic
>> when reading that word.
>>
>> So, if I would be able to use sed on my mails, I would replace
>> unlinkat_s() with unlinkat_w() (for wipe) or would say that _s does
>> stand for 'shred' in the means of shred(1).
>
> TBH, I suspect that the saner API would be something like
> EXT2_IOC_[SG[ETFLAGS, allowing to set and query that along with other
> flags (append-only, etc.).
>
> Forget about unlink; first of all, whatever API you use should only _mark_
> the inode as "zero freed blocks" (or trim, for that matter). You can't
> force freeing of an inode, so either you make sure that subsequent freeing
> of inode, whenever it happens, will do that work, or your API is
> hopelessly
> racy. Moreover, when link has been removed it's too late to report that
> fs has no way to e.g. trim those blocks, so you really want to have it
> done
> _before_ the actual link removal. And if the file contents is that
> sensitive, you'd better extend the same protection to all operations that
> free its
> blocks, including truncate(), fallocate() hole-punching, whatever. What's
> more, if you divorce that from link removal, you probably don't want it as
> in-core-only flag - have it stored in inode, if fs supports that.
>
> Alternatively, you might want to represent it as xattr - as much as I hate
> those, it might turn out to be the best fit in this case, if we end up
> with several variants for freed blocks disposal. Not sure...
>
> But whichever way we represent that state, IMO
> a) operation should be similar to chmod/chattr/setfattr - modifying
> inode metadata.
> b) it should affect _all_ operations freeing blocks of that file
> from that point on
> c) it should be able to fail, telling you that you can't do that for
> this backing store.
Well, chattr already has +s which means exactly this. It's just not
respected by... anything. The 0/5 mentioned it, albeit briefly.
On Feb 3, 2015, at 4:33 PM, Al Viro <[email protected]> wrote:
> On Tue, Feb 03, 2015 at 07:01:50PM +0100, Alexander Holler wrote:
>> Yeah, as I've already admitted in the bug, I never should have use
>> the word secure, because everyone nowadays seems to end up in panic
>> when reading that word.
>>
>> So, if I would be able to use sed on my mails, I would replace
>> unlinkat_s() with unlinkat_w() (for wipe) or would say that _s does
>> stand for 'shred' in the means of shred(1).
>
> TBH, I suspect that the saner API would be something like
> EXT2_IOC_[SG]ETFLAGS, allowing to set and query that along with other
> flags (append-only, etc.).
>
> Forget about unlink; first of all, whatever API you use should only
> _mark_ the inode as "zero freed blocks" (or trim, for that matter).
This already exists for a long time. "chattr +s file [file...]" marks
inodes for "secure deletion" (EXT2_SECRM_FL), but this wasn't implemented.
Cheers, Andreas
> You can't force freeing of an inode, so either you make sure that
> subsequent freeing of inode, whenever it happens, will do that work,
> or your API is hopelessly racy. Moreover, when link has been removed
> it's too late to report that fs has no way to e.g. trim those blocks,
> so you really want to have it done _before_ the actual link removal.
> And if the file contents is that sensitive,
> you'd better extend the same protection to all operations that free its
> blocks, including truncate(), fallocate() hole-punching, whatever. What's
> more, if you divorce that from link removal, you probably don't want it as
> in-core-only flag - have it stored in inode, if fs supports that.
>
> Alternatively, you might want to represent it as xattr - as much as I hate
> those, it might turn out to be the best fit in this case, if we end up
> with several variants for freed blocks disposal. Not sure...
>
> But whichever way we represent that state, IMO
> a) operation should be similar to chmod/chattr/setfattr -
> modifying inode metadata.
> b) it should affect _all_ operations freeing blocks of that
> file from that point on
> c) it should be able to fail, telling you that you can't do
> that for this backing store.
Cheers, Andreas
[CC += linux-api@]
Hello Alexander,
The kernel source file Documentation/SubmitChecklist notes that all
Linux kernel patches that change userspace interfaces should be CCed
to [email protected]. See also
https://www.kernel.org/doc/man-pages/linux-api-ml.html. Please CC
linux-api@ on future iterations of this patch.
Thanks,
Michael
On Mon, Feb 2, 2015 at 6:05 PM, Alexander Holler <[email protected]> wrote:
>
> Hello,
>
> some people already might have noticed that I've got a bit angry that
> no filesystem nor the posix api nor the linux syscalls are offering
> the userspace a way to request real deletion of one or more files (there's
> the 's' bit, see man chattr, but it is ignored by all FS which know it).
>
> Almost all filesystems are working such, that deleting a file just
> means it doesn't appear in the list of files anymore but the contents
> of the file still might be readable on the storage.
>
> So in the last 30 years many tools were created trying to circumvent that
> inability of all filesystems. Up to encrypting the whole storage which
> seems to be the current state of art and which many people recently tried
> to recommend to me.
>
> Also I'm using that workaround already myself since many years, I still
> believe it's a very bad, complictated, cumbersome and very uncomfortable
> way to make sure contents of files are not readable anymore. Besides that,
> just relying on encryption might hit back badly, because encryption often
> suffers from bugs in the implementation, bugs or even backdoors in the
> design and Moore. That means it's unsure how long the used encryption
> will defeat any tries to read the contents of a deleted file from storage
> and the used encryption might be worthless tomorrow. Not to speak about
> the problems with the necessary key-handling.
>
> What's the answer? Easy and obvious, just (try to) overwrite the contents
> of a file by request from userspace. Filesystems do know where on the
> storage they have written the contents to, so why not just let them delete
> that stuff themself instead? It's almost unbelievable that this was not
> already done in the past 30 years.
>
> So, now, after I've got angry enough, I've tried to do it myself, it seems
> to work and wasn't really hard.
>
> Of course, the easy way I think I've found isn't really my achievement.
> Instead it relies on all the work people have already done to support the
> trim command of SSDs. So thanks to all of them. You've made the following
> simple patches possible.
>
> How does it work:
>
> - Implement a new syscall named unlinkat_s() with the same signature as
> unlinkat(). With this syscall filesystems should make the old contents
> of files unreadable and should fail if they can't. This doesn't really
> have to be reliable, because it is often impossible for a filesystem to
> make enough assumptions about the underlying storage to promise secure
> deletion. But it has to mean that the filesystem tried everything it can
> to make sure the contents are unreadabler afterwards, e.g. by overwriting
> them, using secure trim or even just using trim. I've no idea if trim
> might be enough, if I would have implemented trim, it would clear the
> trimmed blocks in flash too, making them unreadable. But I haven't done
> such and I haven't tested if that's the case.
> The new syscall isn't meant to replace unlinkat() for everyday operations,
> therefor operation speed is ignored (see below in regard to a side effect).
>
> - Instruct the filesystem that it should discard or overwrite (all) freed
> blocks while the unlinkat_s() is at work.
>
> - Kill the inode while letting the filesystem discard freed blocks or
> overwrite them. As said before, this was easy through all the work already
> done by others. There even already existed a sb_issue_zeroout() which could
> be used instead of sb_issue_discard().
>
> - Sync the filesystem, to make sure the stuff is written to the storage.
>
>
> This approach has the side effect that while a call of unlinkat_s() is at
> work, all freed blocks will be destroyed, even those which aren't beloning
> to the unlink operation but are freed by possible other running actions.
> But in my humble opinion, that's nothing to care about and it keeps the
> implementation of this feature simple. I like KISS and that's imho the
> main feature of these patches.
>
>
> Things to be aware of when reading and starting to critisize my patches:
>
> - I've never had a look before at the kernel sources in fs/*.
> - They are the result of around half a dozen hours.
> - I'm aware that these patches are imperfect. Perfectionism does cost time
> for which I often don't feel the need to spend it unpaid.
> - I don't care for comments regarding style.
> - They are a proof of concept and are an offer. They are meant for other
> users, not maintainers. I wasn't paid for doing them and I don't care much
> if they will end up in the kernel. I already have and can use them, I'm
> happy with them and I don't really need them in the official kernel as I'm
> able to easily rebase them myself (thanks to git).
> - Don't be disappointed because the patches are that simple. The idea
> counts. ;)
>
>
> Regards,
>
> Alexander Holler
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
[CC += linux-api@]
On Mon, Feb 2, 2015 at 6:05 PM, Alexander Holler <[email protected]> wrote:
> Signed-off-by: Alexander Holler <[email protected]>
> ---
> arch/x86/syscalls/syscall_32.tbl | 1 +
> arch/x86/syscalls/syscall_64.tbl | 1 +
> fs/namei.c | 38 ++++++++++++++++++++++++++++++-----
> include/asm-generic/audit_dir_write.h | 1 +
> include/linux/fs.h | 1 +
> include/linux/syscalls.h | 1 +
> include/uapi/asm-generic/unistd.h | 4 +++-
> tools/perf/builtin-trace.c | 2 ++
> 8 files changed, 43 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/syscalls/syscall_32.tbl b/arch/x86/syscalls/syscall_32.tbl
> index 9fe1b5d..7a3d530 100644
> --- a/arch/x86/syscalls/syscall_32.tbl
> +++ b/arch/x86/syscalls/syscall_32.tbl
> @@ -364,3 +364,4 @@
> 355 i386 getrandom sys_getrandom
> 356 i386 memfd_create sys_memfd_create
> 357 i386 bpf sys_bpf
> +359 i386 unlinkat_s sys_unlinkat_s
> diff --git a/arch/x86/syscalls/syscall_64.tbl b/arch/x86/syscalls/syscall_64.tbl
> index 281150b..97eaf01 100644
> --- a/arch/x86/syscalls/syscall_64.tbl
> +++ b/arch/x86/syscalls/syscall_64.tbl
> @@ -328,6 +328,7 @@
> 319 common memfd_create sys_memfd_create
> 320 common kexec_file_load sys_kexec_file_load
> 321 common bpf sys_bpf
> +322 common unlinkat_s sys_unlinkat_s
>
> #
> # x32-specific system call numbers start at 512 to avoid cache impact
> diff --git a/fs/namei.c b/fs/namei.c
> index db5fe86..1ad3724 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3717,7 +3717,7 @@ EXPORT_SYMBOL(vfs_unlink);
> * writeout happening, and we don't want to prevent access to the directory
> * while waiting on the I/O.
> */
> -static long do_unlinkat(int dfd, const char __user *pathname)
> +static long do_unlinkat(int dfd, const char __user *pathname, bool secure)
> {
> int error;
> struct filename *name;
> @@ -3759,8 +3759,25 @@ exit2:
> dput(dentry);
> }
> mutex_unlock(&nd.path.dentry->d_inode->i_mutex);
> - if (inode)
> - iput(inode); /* truncate the inode here */
> + if (inode) {
> + // TODO:
> + // if (inode is file and 's' flag is set)
> + // secure = true;
> + if (!secure)
> + iput(inode); /* truncate the inode here */
> + else {
> + struct super_block *sb = inode->i_sb;
> + if (sb->s_op->set_secure_delete)
> + sb->s_op->set_secure_delete(sb, true);
> + // TODO: We should fail if secure isn't supported,
> + // look up how that's possible here.
> + iput(inode); /* truncate the inode here */
> + // TODO: check if sb is still valid after the inode is gone
> + sync_filesystem(sb);
> + if (sb->s_op->set_secure_delete)
> + sb->s_op->set_secure_delete(sb, false);
> + }
> + }
> inode = NULL;
> if (delegated_inode) {
> error = break_deleg_wait(&delegated_inode);
> @@ -3796,12 +3813,23 @@ SYSCALL_DEFINE3(unlinkat, int, dfd, const char __user *, pathname, int, flag)
> if (flag & AT_REMOVEDIR)
> return do_rmdir(dfd, pathname);
>
> - return do_unlinkat(dfd, pathname);
> + return do_unlinkat(dfd, pathname, false);
> }
>
> SYSCALL_DEFINE1(unlink, const char __user *, pathname)
> {
> - return do_unlinkat(AT_FDCWD, pathname);
> + return do_unlinkat(AT_FDCWD, pathname, false);
> +}
> +
> +SYSCALL_DEFINE3(unlinkat_s, int, dfd, const char __user *, pathname, int, flag)
> +{
> + if ((flag & ~AT_REMOVEDIR) != 0)
> + return -EINVAL;
> +
> + if (flag & AT_REMOVEDIR)
> + return do_rmdir(dfd, pathname);
> +
> + return do_unlinkat(dfd, pathname, true);
> }
>
> int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
> diff --git a/include/asm-generic/audit_dir_write.h b/include/asm-generic/audit_dir_write.h
> index 7b61db4..5282aba 100644
> --- a/include/asm-generic/audit_dir_write.h
> +++ b/include/asm-generic/audit_dir_write.h
> @@ -29,4 +29,5 @@ __NR_unlinkat,
> __NR_renameat,
> __NR_linkat,
> __NR_symlinkat,
> +__NR_unlinkat_s,
> #endif
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ab779e..039e969 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1594,6 +1594,7 @@ struct super_operations {
> int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t);
> long (*nr_cached_objects)(struct super_block *, int);
> long (*free_cached_objects)(struct super_block *, long, int);
> + void (*set_secure_delete) (struct super_block *, bool);
> };
>
> /*
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index bda9b81..b88019b 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -877,4 +877,5 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags,
> asmlinkage long sys_getrandom(char __user *buf, size_t count,
> unsigned int flags);
> asmlinkage long sys_bpf(int cmd, union bpf_attr *attr, unsigned int size);
> +asmlinkage long sys_unlinkat_s(int dfd, const char __user * pathname, int flag);
> #endif
> diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> index 22749c1..2ba072e 100644
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -707,9 +707,11 @@ __SYSCALL(__NR_getrandom, sys_getrandom)
> __SYSCALL(__NR_memfd_create, sys_memfd_create)
> #define __NR_bpf 280
> __SYSCALL(__NR_bpf, sys_bpf)
> +#define __NR_unlinkat_s 281
> +__SYSCALL(__NR_unlinkat_s, sys_unlinkat_s)
>
> #undef __NR_syscalls
> -#define __NR_syscalls 281
> +#define __NR_syscalls 282
>
> /*
> * All syscalls below here should go away really,
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index fb12645..1507335 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -1110,6 +1110,8 @@ static struct syscall_fmt {
> { .name = "uname", .errmsg = true, .alias = "newuname", },
> { .name = "unlinkat", .errmsg = true,
> .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
> + { .name = "unlinkat_s", .errmsg = true,
> + .arg_scnprintf = { [0] = SCA_FDAT, /* dfd */ }, },
> { .name = "utimensat", .errmsg = true,
> .arg_scnprintf = { [0] = SCA_FDAT, /* dirfd */ }, },
> { .name = "write", .errmsg = true,
> --
> 2.1.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
Am 04.02.2015 um 00:33 schrieb Al Viro:
> On Tue, Feb 03, 2015 at 07:01:50PM +0100, Alexander Holler wrote:
>
>> Yeah, as I've already admitted in the bug, I never should have use
>> the word secure, because everyone nowadays seems to end up in panic
>> when reading that word.
>>
>> So, if I would be able to use sed on my mails, I would replace
>> unlinkat_s() with unlinkat_w() (for wipe) or would say that _s does
>> stand for 'shred' in the means of shred(1).
>
> TBH, I suspect that the saner API would be something like EXT2_IOC_[SG[ETFLAGS,
> allowing to set and query that along with other flags (append-only, etc.).
>
> Forget about unlink; first of all, whatever API you use should only _mark_
> the inode as "zero freed blocks" (or trim, for that matter). You can't
> force freeing of an inode, so either you make sure that subsequent freeing
> of inode, whenever it happens, will do that work, or your API is hopelessly
> racy. Moreover, when link has been removed it's too late to report that
> fs has no way to e.g. trim those blocks, so you really want to have it done
> _before_ the actual link removal. And if the file contents is that sensitive,
> you'd better extend the same protection to all operations that free its
> blocks, including truncate(), fallocate() hole-punching, whatever. What's
> more, if you divorce that from link removal, you probably don't want it as
> in-core-only flag - have it stored in inode, if fs supports that.
>
> Alternatively, you might want to represent it as xattr - as much as I hate
> those, it might turn out to be the best fit in this case, if we end up
> with several variants for freed blocks disposal. Not sure...
>
> But whichever way we represent that state, IMO
> a) operation should be similar to chmod/chattr/setfattr - modifying
> inode metadata.
> b) it should affect _all_ operations freeing blocks of that file
> from that point on
> c) it should be able to fail, telling you that you can't do that for
> this backing store.
My intention to use unlinkat() or unlinkat_s() was the following:
- It can be supported by most filesystems (see my fat patch)
- It doesn't really make any promises it can't, like deleting leftovers
of an already modified file. That's where a much more complicated
solution like the 's' attribute would appropriate. It just should try to
wipe the current contents of a file.
The second reason was also the reason why I've crafted the subject of
the RFC very carefully: "Offer a way for userspace to request real
deletion of files".
I did that to avoid the nitpickers. It doesn't say how the request is or
has to be handled. I was aware of all the problems which arise if one
tries to fullfill what the 's' flag promises. The final result of trying
to get a 100 percent solution is just what we have now: nothing at all.
It wasn't the first time I've posted a patch to LKML, I know that
maintainers like to request high towers from ordinary people and
therefor very often nice dog houses were refused. There might be a
legitimate reason to request a high tower from a big company, but that's
something totally different.
And I refuse to try to understand why maintainers request high towers. ;)
And because hope never dies, I was again silly enough to post a simple
patch. ;)
Regards,
Alexander Hpller
On Wed, 4 Feb 2015, Alexander Holler wrote:
> Date: Wed, 04 Feb 2015 11:19:01 +0100
> From: Alexander Holler <[email protected]>
> To: Al Viro <[email protected]>
> Cc: Theodore Ts'o <[email protected]>, [email protected],
> [email protected]
> Subject: Re: [PATCH 1/5] WIP: Add syscall unlinkat_s (currently x86* only)
>
> Am 04.02.2015 um 00:33 schrieb Al Viro:
> > On Tue, Feb 03, 2015 at 07:01:50PM +0100, Alexander Holler wrote:
> >
> > > Yeah, as I've already admitted in the bug, I never should have use
> > > the word secure, because everyone nowadays seems to end up in panic
> > > when reading that word.
> > >
> > > So, if I would be able to use sed on my mails, I would replace
> > > unlinkat_s() with unlinkat_w() (for wipe) or would say that _s does
> > > stand for 'shred' in the means of shred(1).
> >
> > TBH, I suspect that the saner API would be something like
> > EXT2_IOC_[SG[ETFLAGS,
> > allowing to set and query that along with other flags (append-only, etc.).
> >
> > Forget about unlink; first of all, whatever API you use should only _mark_
> > the inode as "zero freed blocks" (or trim, for that matter). You can't
> > force freeing of an inode, so either you make sure that subsequent freeing
> > of inode, whenever it happens, will do that work, or your API is hopelessly
> > racy. Moreover, when link has been removed it's too late to report that
> > fs has no way to e.g. trim those blocks, so you really want to have it done
> > _before_ the actual link removal. And if the file contents is that
> > sensitive,
> > you'd better extend the same protection to all operations that free its
> > blocks, including truncate(), fallocate() hole-punching, whatever. What's
> > more, if you divorce that from link removal, you probably don't want it as
> > in-core-only flag - have it stored in inode, if fs supports that.
> >
> > Alternatively, you might want to represent it as xattr - as much as I hate
> > those, it might turn out to be the best fit in this case, if we end up
> > with several variants for freed blocks disposal. Not sure...
> >
> > But whichever way we represent that state, IMO
> > a) operation should be similar to chmod/chattr/setfattr - modifying
> > inode metadata.
> > b) it should affect _all_ operations freeing blocks of that file
> > from that point on
> > c) it should be able to fail, telling you that you can't do that for
> > this backing store.
>
> My intention to use unlinkat() or unlinkat_s() was the following:
>
> - It can be supported by most filesystems (see my fat patch)
>
> - It doesn't really make any promises it can't, like deleting leftovers of an
> already modified file. That's where a much more complicated solution like the
> 's' attribute would appropriate. It just should try to wipe the current
> contents of a file.
>
> The second reason was also the reason why I've crafted the subject of the RFC
> very carefully: "Offer a way for userspace to request real deletion of files".
>
> I did that to avoid the nitpickers. It doesn't say how the request is or has
> to be handled. I was aware of all the problems which arise if one tries to
> fullfill what the 's' flag promises. The final result of trying to get a 100
> percent solution is just what we have now: nothing at all.
>
> It wasn't the first time I've posted a patch to LKML, I know that maintainers
> like to request high towers from ordinary people and therefor very often nice
> dog houses were refused. There might be a legitimate reason to request a high
> tower from a big company, but that's something totally different.
>
> And I refuse to try to understand why maintainers request high towers. ;)
>
> And because hope never dies, I was again silly enough to post a simple patch.
> ;)
It really comes down to what your intentions and expectations are.
If you just wanted to have a discussion about this feature, you have
it and that's fine.
If your intention was to propose the actual solution, the actual
feature that would be merged to kernel, then your expectation should
have been exactly this, discussion, improvement propositions, cases
to take care of and so on.
For example the interface Al proposed is much better than the
solution in the patch. And he is right about the other block freeing
operations.
No one told you not to proceed with the implementation AFAIK. Some
are skeptical and that's fine, but you have all means to proof them
wrong.
The fact is that the current patches are useless for anything other
than proof-of-concept. Now you know more that needs to be done or
thought about, but if you're not willing to do the work, then please
stop complaining about "high towers". I am not a maintainer and I
thinks that the feedback you've got is entirely reasonable. Take it
as you will.
One more thing, can I ask you what were your expectations when
posting those patches ?
Regards,
-Lukas
>
> Regards,
>
> Alexander Hpller
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Am 04.02.2015 um 13:07 schrieb Luk?? Czerner:
> The fact is that the current patches are useless for anything other
> than proof-of-concept. Now you know more that needs to be done or
That's wrong. The patches already work. If you delete a file which isn't
in use by something else, the current contents will be wiped on
traditional harddrives. I assume that already fulfills more than 50% of
use cases of ordinary people.
> thought about, but if you're not willing to do the work, then please
> stop complaining about "high towers". I am not a maintainer and I
> thinks that the feedback you've got is entirely reasonable. Take it
> as you will.
>
> One more thing, can I ask you what were your expectations when
> posting those patches ?
I've posted them for other users which are happy with what I've
explained above. Besides requesting an API which makes such a simple
solution, in contrast to the the 's' bit, possible.
Alexander Holler
Am 04.02.2015 um 13:22 schrieb Alexander Holler:
> Am 04.02.2015 um 13:07 schrieb Luk?? Czerner:
>
>> The fact is that the current patches are useless for anything other
>> than proof-of-concept. Now you know more that needs to be done or
>
> That's wrong. The patches already work. If you delete a file which isn't
> in use by something else, the current contents will be wiped on
> traditional harddrives. I assume that already fulfills more than 50% of
> use cases of ordinary people.
>
>> thought about, but if you're not willing to do the work, then please
>> stop complaining about "high towers". I am not a maintainer and I
>> thinks that the feedback you've got is entirely reasonable. Take it
>> as you will.
>>
>> One more thing, can I ask you what were your expectations when
>> posting those patches ?
>
> I've posted them for other users which are happy with what I've
> explained above. Besides requesting an API which makes such a simple
> solution, in contrast to the the 's' bit, possible.
To be more precise: How do you add something like EXT2_IOC_[SG[ETFLAGS
to vfat or one of the dozens other filesystems which don't know about
linux-specific flags? I don't see a way to do that, so there's only
unlinkat() left.
Alexander Holler
Am 04.02.2015 um 13:42 schrieb Alexander Holler:
> Am 04.02.2015 um 13:22 schrieb Alexander Holler:
>> Am 04.02.2015 um 13:07 schrieb Luk?? Czerner:
>>
>>> The fact is that the current patches are useless for anything other
>>> than proof-of-concept. Now you know more that needs to be done or
>>
>> That's wrong. The patches already work. If you delete a file which isn't
>> in use by something else, the current contents will be wiped on
>> traditional harddrives. I assume that already fulfills more than 50% of
>> use cases of ordinary people.
>>
>>> thought about, but if you're not willing to do the work, then please
>>> stop complaining about "high towers". I am not a maintainer and I
>>> thinks that the feedback you've got is entirely reasonable. Take it
>>> as you will.
>>>
>>> One more thing, can I ask you what were your expectations when
>>> posting those patches ?
>>
>> I've posted them for other users which are happy with what I've
>> explained above. Besides requesting an API which makes such a simple
>> solution, in contrast to the the 's' bit, possible.
>
> To be more precise: How do you add something like EXT2_IOC_[SG[ETFLAGS
> to vfat or one of the dozens other filesystems which don't know about
> linux-specific flags? I don't see a way to do that, so there's only
> unlinkat() left.
Or to be give an actual use case, mount a (v)fat formatted usb-stick,
-hdd or mmc, delete a file with the patches I offered, unmount it, try
to find the contents of the deleted file at device-level (e.g. by
grepping the partition).
Alexander Holler
Alexander,
On Wed, Feb 4, 2015 at 1:22 PM, Alexander Holler <[email protected]> wrote:
> Am 04.02.2015 um 13:07 schrieb Lukáš Czerner:
>
>> The fact is that the current patches are useless for anything other
>> than proof-of-concept. Now you know more that needs to be done or
>
>
> That's wrong. The patches already work. If you delete a file which isn't in
> use by something else, the current contents will be wiped on traditional
> harddrives. I assume that already fulfills more than 50% of use cases of
> ordinary people.
You are getting various feedback from people, that you seem to be ignoring.
Al Viro, in his curmedgeonly way, points out that the problems are
much deeper than you realize. He does not say so explicitly, but I
imagine his point is that he does not want to see the kernel cluttered
with "partial" solutions that will simply increase the maintenance
burden in the long term, and leave bugs to be fixed further down the
line. You seem not to be listening.
Lukáš points out to you that getting a feature like this into the
kernel is complex process. You seem unwilling to hear that, and still
just want your partial solution.
I tell you that discussions of APIs should CC linux-api, which I am
now CCing into this thread, again, because, again, you're not
listening to feedback.
Nobody is asking for "high towers"; they just have their eyes on the
big picture. And the people here are just "ordinary people" with a
*lot* of experience dealing with kernel code (I exclude myself) . They
see many complexities that you don't. Getting intersting features into
the kernel requires a lot of work, and careful listening.
Thanks,
Michael
--
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
Am 04.02.2015 um 13:50 schrieb Alexander Holler:
> Am 04.02.2015 um 13:42 schrieb Alexander Holler:
>> Am 04.02.2015 um 13:22 schrieb Alexander Holler:
>>> Am 04.02.2015 um 13:07 schrieb Luk?? Czerner:
>>>
>>>> The fact is that the current patches are useless for anything other
>>>> than proof-of-concept. Now you know more that needs to be done or
>>>
>>> That's wrong. The patches already work. If you delete a file which isn't
>>> in use by something else, the current contents will be wiped on
>>> traditional harddrives. I assume that already fulfills more than 50% of
>>> use cases of ordinary people.
>>>
>>>> thought about, but if you're not willing to do the work, then please
>>>> stop complaining about "high towers". I am not a maintainer and I
>>>> thinks that the feedback you've got is entirely reasonable. Take it
>>>> as you will.
>>>>
>>>> One more thing, can I ask you what were your expectations when
>>>> posting those patches ?
>>>
>>> I've posted them for other users which are happy with what I've
>>> explained above. Besides requesting an API which makes such a simple
>>> solution, in contrast to the the 's' bit, possible.
>>
>> To be more precise: How do you add something like EXT2_IOC_[SG[ETFLAGS
>> to vfat or one of the dozens other filesystems which don't know about
>> linux-specific flags? I don't see a way to do that, so there's only
>> unlinkat() left.
>
> Or to be give an actual use case, mount a (v)fat formatted usb-stick,
> -hdd or mmc, delete a file with the patches I offered, unmount it, try
> to find the contents of the deleted file at device-level (e.g. by
> grepping the partition).
Maybe I should mention that I've tried it with bug reports instead
patches before. Beeing aware that I might be unable to write perfect
patches with the resources I'm able or willing to spend.
I just needed some days until the one for ext4 was closed, leaving no
hope that it might become fixed without trying it myself.
Alexander Holler
Am 04.02.2015 um 14:06 schrieb Michael Kerrisk:
> Alexander,
>
> On Wed, Feb 4, 2015 at 1:22 PM, Alexander Holler <[email protected]> wrote:
>> Am 04.02.2015 um 13:07 schrieb Lukáš Czerner:
>>
>>> The fact is that the current patches are useless for anything other
>>> than proof-of-concept. Now you know more that needs to be done or
>>
>>
>> That's wrong. The patches already work. If you delete a file which isn't in
>> use by something else, the current contents will be wiped on traditional
>> harddrives. I assume that already fulfills more than 50% of use cases of
>> ordinary people.
>
> You are getting various feedback from people, that you seem to be ignoring.
I'm happy for all the feedback. But it doesn't help me. I'm not going to
spend the necessary time unpaid.
.
> Al Viro, in his curmedgeonly way, points out that the problems are
> much deeper than you realize. He does not say so explicitly, but I
> imagine his point is that he does not want to see the kernel cluttered
> with "partial" solutions that will simply increase the maintenance
> burden in the long term, and leave bugs to be fixed further down the
> line. You seem not to be listening.
It doesn't help me nor anyone else. As Eric Sandeen made me aware
through in bug, look at http://lwn.net/Articles/462437/ what already
happened.
> Lukáš points out to you that getting a feature like this into the
> kernel is complex process. You seem unwilling to hear that, and still
> just want your partial solution.
Wrong. I don't want my partial solution to be part of the official
kernel. I don't care. I offered it for other users because I'm aware
that has become almost impossible for normal people to get something
into the kernel without spending an unbelievable amount of time most
people can't afford to spend.
> I tell you that discussions of APIs should CC linux-api, which I am
> now CCing into this thread, again, because, again, you're not
> listening to feedback.
Please don't confuse "not listening" with "unable to fulfill Linux
kernel maintainer requests".
Alexander Holler
Am 04.02.2015 um 14:21 schrieb Alexander Holler:
>> I tell you that discussions of APIs should CC linux-api, which I am
>> now CCing into this thread, again, because, again, you're not
>> listening to feedback.
>
> Please don't confuse "not listening" with "unable to fulfill Linux
> kernel maintainer requests".
I really wonder what do you expect from people not getting paid to spend
time for fulfilling maintainer request?
I've written bugs and even offered some patches (regardless how usefull
there are in your eyes, it's more than most other people can do).
And all what it brought me is that I receive flames like your one.
Do you really think that's the right way to stimulate people in helping
to make Linux better?
Alexander Holler
Am 04.02.2015 um 14:29 schrieb Alexander Holler:
> Am 04.02.2015 um 14:21 schrieb Alexander Holler:
>
>>> I tell you that discussions of APIs should CC linux-api, which I am
>>> now CCing into this thread, again, because, again, you're not
>>> listening to feedback.
>>
>> Please don't confuse "not listening" with "unable to fulfill Linux
>> kernel maintainer requests".
>
> I really wonder what do you expect from people not getting paid to spend
> time for fulfilling maintainer request?
>
> I've written bugs and even offered some patches (regardless how usefull
> there are in your eyes, it's more than most other people can do).
>
> And all what it brought me is that I receive flames like your one.
>
> Do you really think that's the right way to stimulate people in helping
> to make Linux better?
I'm really sorry that I can't spend several unpaid months with reading
and understanding ever changing linux kernel sources in order to become
a Linux filesystem expert and send some fully working perfect patches
which do fix the problem in question.
And I can't spend the necessary time to play remote keyboard for kernel
maintainers which might be willing to explain me what has to be done
according to their view. I've already offered what I was willing to do,
for the price of having to defend myself over and over. And
unfortunately that wasn't the first time I've ended up with having to
defend myself.
My conclusion is that I'm a real fool having posted multiple times
patches to this list. It just doesn't make any sense and most of the
time the only reward are flames.
Alexander Holler
On Wed, 4 Feb 2015, Alexander Holler wrote:
> Date: Wed, 04 Feb 2015 14:21:12 +0100
> From: Alexander Holler <[email protected]>
> To: Michael Kerrisk <[email protected]>
> Cc: Lukáš Czerner <[email protected]>, Al Viro <[email protected]>,
> Theodore Ts'o <[email protected]>,
> Linux-Fsdevel <[email protected]>,
> Linux Kernel <[email protected]>,
> Linux API <[email protected]>
> Subject: Re: [PATCH 1/5] WIP: Add syscall unlinkat_s (currently x86* only)
>
> Am 04.02.2015 um 14:06 schrieb Michael Kerrisk:
> > Alexander,
> >
> > On Wed, Feb 4, 2015 at 1:22 PM, Alexander Holler <[email protected]>
> > wrote:
> > > Am 04.02.2015 um 13:07 schrieb Lukáš Czerner:
> > >
> > > > The fact is that the current patches are useless for anything other
> > > > than proof-of-concept. Now you know more that needs to be done or
> > >
> > >
> > > That's wrong. The patches already work. If you delete a file which isn't
> > > in
> > > use by something else, the current contents will be wiped on traditional
> > > harddrives. I assume that already fulfills more than 50% of use cases of
> > > ordinary people.
> >
> > You are getting various feedback from people, that you seem to be ignoring.
>
> I'm happy for all the feedback. But it doesn't help me. I'm not going to spend
> the necessary time unpaid.
Right, you'd much rather have someone else to spend the time on your
request unpaid. That's understandable, but unreasonable. You want
it, implement it, or pay someone else to do it for you.
> .
> > Al Viro, in his curmedgeonly way, points out that the problems are
> > much deeper than you realize. He does not say so explicitly, but I
> > imagine his point is that he does not want to see the kernel cluttered
> > with "partial" solutions that will simply increase the maintenance
> > burden in the long term, and leave bugs to be fixed further down the
> > line. You seem not to be listening.
>
> It doesn't help me nor anyone else. As Eric Sandeen made me aware through in
> bug, look at http://lwn.net/Articles/462437/ what already happened.
That's what people have been trying to tell you. It's not an easy
task and there are plenty of cases to think about. As you can see
IBM tasked their developer to do it, but they did not succeed. And
here you come with your simplistic patches crying about "high
towers. But you're the one apparently interested in this feature
and you've been warned that's it's not a simple task.
But if you really want it I really do encourage you to try. I'd be
happy to have a working and reliable secure delete feature but it's
not my priority at all.
-Lukas
>
> > Lukáš points out to you that getting a feature like this into the
> > kernel is complex process. You seem unwilling to hear that, and still
> > just want your partial solution.
>
> Wrong. I don't want my partial solution to be part of the official kernel. I
> don't care. I offered it for other users because I'm aware that has become
> almost impossible for normal people to get something into the kernel without
> spending an unbelievable amount of time most people can't afford to spend.
>
> > I tell you that discussions of APIs should CC linux-api, which I am
> > now CCing into this thread, again, because, again, you're not
> > listening to feedback.
>
> Please don't confuse "not listening" with "unable to fulfill Linux kernel
> maintainer requests".
>
> Alexander Holler
>
>
On 2015-02-04 09:19, Alexander Holler wrote:
> Am 04.02.2015 um 14:29 schrieb Alexander Holler:
> I'm really sorry that I can't spend several unpaid months with reading
> and understanding ever changing linux kernel sources in order to become
> a Linux filesystem expert and send some fully working perfect patches
> which do fix the problem in question.
You aren't expected to do so. Code review is an integral part of the
development process here, and only truly trivial patches (stuff like
fixing typos in kernel messages and documentation) get merged without
it. If you pay attention to the list itself, even the veteran kernel
developers almost never manage to produce a patch that is deemed
absolutely perfect, and end up revising things multiple times before
they get merged.
> And I can't spend the necessary time to play remote keyboard for kernel
> maintainers which might be willing to explain me what has to be done
> according to their view. I've already offered what I was willing to do,
> for the price of having to defend myself over and over. And
> unfortunately that wasn't the first time I've ended up with having to
> defend myself.
You seem to fail to understand that open source development runs
primarily on volunteer work (yes there are people paid to work on open
source software, but that is a generally exceptional case). A large
majority of the people who are kernel maintainers are donating their
free time to the project.
> My conclusion is that I'm a real fool having posted multiple times
> patches to this list. It just doesn't make any sense and most of the
> time the only reward are flames.
If you aren't serious about trying to get something into the mainline
kernel, you should be tagging _all_ of the e-mails in that patch-set
with [RFC] in the subject line.
In none of the responses that I've seen has anyone been anything but
polite (albeit in some cases moderately annoyed). If you really
consider such attempts at constructive criticism to be flames, then a
development mailing list isn't the place you should be posting patches.
Am 04.02.2015 um 15:52 schrieb Lukáš Czerner:
> On Wed, 4 Feb 2015, Alexander Holler wrote:
>> I'm happy for all the feedback. But it doesn't help me. I'm not going to spend
>> the necessary time unpaid.
>
> Right, you'd much rather have someone else to spend the time on your
> request unpaid. That's understandable, but unreasonable. You want
> it, implement it, or pay someone else to do it for you.
Maybe you should attach a big fat red warning to the kernels bugzilla
that filing a bug means either to fix it yourself or pay somone to do that.
I've never demanded that someone else fixes it.
I've just explained a problem.
Unbelievable how someone could do such without paying someone else to
fix it or by fixing it themself ...
On Wed, 4 Feb 2015, Alexander Holler wrote:
> Date: Wed, 04 Feb 2015 17:12:52 +0100
> From: Alexander Holler <[email protected]>
> To: Lukáš Czerner <[email protected]>
> Cc: Michael Kerrisk <[email protected]>,
> Al Viro <[email protected]>, Theodore Ts'o <[email protected]>,
> Linux-Fsdevel <[email protected]>,
> Linux Kernel <[email protected]>,
> Linux API <[email protected]>
> Subject: Re: [PATCH 1/5] WIP: Add syscall unlinkat_s (currently x86* only)
>
> Am 04.02.2015 um 15:52 schrieb Lukáš Czerner:
> > On Wed, 4 Feb 2015, Alexander Holler wrote:
>
> >> I'm happy for all the feedback. But it doesn't help me. I'm not going to spend
> >> the necessary time unpaid.
> >
> > Right, you'd much rather have someone else to spend the time on your
> > request unpaid. That's understandable, but unreasonable. You want
> > it, implement it, or pay someone else to do it for you.
>
> Maybe you should attach a big fat red warning to the kernels bugzilla
> that filing a bug means either to fix it yourself or pay somone to do that.
>
> I've never demanded that someone else fixes it.
>
> I've just explained a problem.
>
> Unbelievable how someone could do such without paying someone else to
> fix it or by fixing it themself ...
It's not a bug, you're requesting a feature.
Am 04.02.2015 um 17:25 schrieb Lukáš Czerner:
> On Wed, 4 Feb 2015, Alexander Holler wrote:
>> Am 04.02.2015 um 15:52 schrieb Lukáš Czerner:
>>> On Wed, 4 Feb 2015, Alexander Holler wrote:
>>
>>>> I'm happy for all the feedback. But it doesn't help me. I'm not going to spend
>>>> the necessary time unpaid.
>>>
>>> Right, you'd much rather have someone else to spend the time on your
>>> request unpaid. That's understandable, but unreasonable. You want
>>> it, implement it, or pay someone else to do it for you.
>>
>> Maybe you should attach a big fat red warning to the kernels bugzilla
>> that filing a bug means either to fix it yourself or pay somone to do that.
>>
>> I've never demanded that someone else fixes it.
>>
>> I've just explained a problem.
>>
>> Unbelievable how someone could do such without paying someone else to
>> fix it or by fixing it themself ...
>
> It's not a bug, you're requesting a feature.
>
Ok, I'm guilty.
May I ask if there's somewhere a feature request tracker which doesn't
cruzify someone because he suggest a (maybe wrong) solution and tries to
show that this might work with some prelimary, broken, silly, quick and
dirty patches?
Am 04.02.2015 um 17:45 schrieb Alexander Holler:
> Am 04.02.2015 um 17:25 schrieb Lukáš Czerner:
>> On Wed, 4 Feb 2015, Alexander Holler wrote:
>
>>> Am 04.02.2015 um 15:52 schrieb Lukáš Czerner:
>>>> On Wed, 4 Feb 2015, Alexander Holler wrote:
>>>
>>>>> I'm happy for all the feedback. But it doesn't help me. I'm not
>>>>> going to spend
>>>>> the necessary time unpaid.
>>>>
>>>> Right, you'd much rather have someone else to spend the time on your
>>>> request unpaid. That's understandable, but unreasonable. You want
>>>> it, implement it, or pay someone else to do it for you.
>>>
>>> Maybe you should attach a big fat red warning to the kernels bugzilla
>>> that filing a bug means either to fix it yourself or pay somone to do
>>> that.
>>>
>>> I've never demanded that someone else fixes it.
>>>
>>> I've just explained a problem.
>>>
>>> Unbelievable how someone could do such without paying someone else to
>>> fix it or by fixing it themself ...
>>
>> It's not a bug, you're requesting a feature.
>>
>
> Ok, I'm guilty.
>
> May I ask if there's somewhere a feature request tracker which doesn't
> cruzify someone because he suggest a (maybe wrong) solution and tries to
> show that this might work with some prelimary, broken, silly, quick and
> dirty patches?
I guess the answer is FreeBSD or similiar. ;)
On Wed, Feb 04, 2015 at 03:52:02PM +0100, Lukáš Czerner wrote:
> > I'm happy for all the feedback. But it doesn't help me. I'm not going to spend
> > the necessary time unpaid.
>
> Right, you'd much rather have someone else to spend the time on your
> request unpaid. That's understandable, but unreasonable. You want
> it, implement it, or pay someone else to do it for you.
>
> > It doesn't help me nor anyone else. As Eric Sandeen made me aware through in
> > bug, look at http://lwn.net/Articles/462437/ what already happened.
>
> That's what people have been trying to tell you. It's not an easy
> task and there are plenty of cases to think about. As you can see
> IBM tasked their developer to do it, but they did not succeed. And
> here you come with your simplistic patches crying about "high
> towers. But you're the one apparently interested in this feature
> and you've been warned that's it's not a simple task.
And indeed, people who do have salaries paid by companies who care
about this general problem in actual products have been working on
addressing it using encryption, such that when the user is removed
from the device, the key is blasted. More importantly, when the user
is not logged in, the key isn't even *available* on the device. So it
solves more problems than the one that you are concerned about, and in
general maintainers prefer solutions that solve multiple problems,
because that minimizes the number of one-time hacks and partial/toy
solutions which turn into long-term maintainance headaches. (After
all, if you insist on having a partial/toy solution merged, that turns
into an unfunded mandate which the maintainers effectively have to
support for free, forever.)
You've rejected encryption as a proposed solution as not meeting your
requirements (which if I understand your objections, can be summarized
as "encryption is too hard"). This is fine, but if you want someone
*else* to implement your partial toy solution which is less secure,
then you will either need to pay someone to do it or do it yourself.
> > Wrong. I don't want my partial solution to be part of the official kernel. I
> > don't care. I offered it for other users because I'm aware that has become
> > almost impossible for normal people to get something into the kernel without
> > spending an unbelievable amount of time most people can't afford to spend.
So you expect other users to just apply your patches and use an
unofficial system call number that might get reassigned to some other
user later on?
If that's all you want, then ok, you're done. The patches have been
posted to LKML, and you can give people URL's if they want to try
applying the patches on their own.
Cheers,
- Ted
Am 04.02.2015 um 20:33 schrieb Theodore Ts'o:
> And indeed, people who do have salaries paid by companies who care
> about this general problem in actual products have been working on
> addressing it using encryption, such that when the user is removed
> from the device, the key is blasted. More importantly, when the user
> is not logged in, the key isn't even *available* on the device. So it
> solves more problems than the one that you are concerned about, and in
> general maintainers prefer solutions that solve multiple problems,
> because that minimizes the number of one-time hacks and partial/toy
> solutions which turn into long-term maintainance headaches. (After
> all, if you insist on having a partial/toy solution merged, that turns
> into an unfunded mandate which the maintainers effectively have to
> support for free, forever.)
It's just another layer above and an rather ugly workaround which ends
up in having to manage keys and doesn't solve the real problem. Besides
that it's much more complicated especially in kind of kernel sources to
manage.
> You've rejected encryption as a proposed solution as not meeting your
> requirements (which if I understand your objections, can be summarized
> as "encryption is too hard"). This is fine, but if you want someone
> *else* to implement your partial toy solution which is less secure,
> then you will either need to pay someone to do it or do it yourself.
I haven't rejected it. I'm using that myself since around 10 years,
because of the impossibility to really delete files when using Linux.
>>> Wrong. I don't want my partial solution to be part of the official kernel. I
>>> don't care. I offered it for other users because I'm aware that has become
>>> almost impossible for normal people to get something into the kernel without
>>> spending an unbelievable amount of time most people can't afford to spend.
>
> So you expect other users to just apply your patches and use an
> unofficial system call number that might get reassigned to some other
> user later on?
People do such all the time because the mainline kernel is otherwise
unusable on many boards.
Besides that, I don't expect that anyone uses my patches.
As said multiple times before, they are an offer and were primarily
meant to show a possible simple solution for many use cases. They
already work with inside some, of course maybe uncomfortable, limits and
don't do any worse. just better.
Alexander Holler
Hello.
I've set up a repository at github which contains the 3 pathches to add
limited support to the Linux kernel for wiping files on ext4 and (v)fat
with 3 small patches and a total of "9 files changed, 101 insertions(+),
8 deletions(-)" here:
https://github.com/aholler/wipe_lnx
Feel free to send me any comments, patches or even flames in privat
(off-list)! because I don't want to become involved in annoying
discussions here anymore.
Alexander Holler
Alexander Holler <holler <at> ahsoftware.de> writes:
>
> Hello.
>
> I've set up a repository at github which contains the 3 pathches to add
> limited support to the Linux kernel for wiping files on ext4 and (v)fat
> with 3 small patches and a total of "9 files changed, 101 insertions(+),
> 8 deletions(-)" here:
>
> https://github.com/aholler/wipe_lnx
>
> Feel free to send me any comments, patches or even flames in privat
> (off-list)! because I don't want to become involved in annoying
> discussions here anymore.
>
> Alexander Holler
>
This is certainly a case of "The Emperor's New Clothes". Lets say I use vim to
edit my file containing my deep dark secrets. Lets strace it and see what
happens when I edit it and save a new copy:
rename("secure_document.txt", "secure_document.txt~") = 0
open("secure_document.txt", O_WRONLY|O_CREAT|O_TRUNC, 0664) = 3
write(3, "secrete:s\n", 10) = 10
fsync(3) = 0
close(3) = 0
chmod("secure_document.txt", 0100664) = 0
setxattr("secure_document.txt", "system.posix_acl_access",
"\x02\x00\x00\x00\x01\x00\x06\x00\xff\xff\xff\xff\x04\x00\x06\x00\xff\xff\xff\x
ff \x00\x04\x00\xff\xff\xff\xff", 28, 0) = 0
unlink("secure_document.txt~") = 0
You'll find that just about every program that deals with files properly does
something like this. If it didn't, there'd be a good chance of losing all your
work if the computer or program crashed while saving your file. This is layer
one of the problem.
Layer 2 is filesystems, as others have noted, filesystems have all sorts of
paths for blocks no longer being associated with inodes. Log structured file
systems doubly so.
And layer 3, media, which we have no control over and may be storing duplicate
copies of the data for any number of reasons. But as you've pointed out, is
likely to require significant funds to get at.
As pointed out, the best you could do is some sort of flag on the inode that
instructed the filesystem to wipe blocks before separating them from the inode.
Programs would need to be modified though as you can see in the vim case, any
copying of file mode bits are only done after data has been written to disk.
Luckily there is an easy solution out there that solves all these problems.