2016-04-14 14:03:52

by Nicolas Boichat

[permalink] [raw]
Subject: INFO: inconsistent lock state in ecryptfs_calculate_md5

Hi,

We've run into the following lockdep warning while using a system with
ecryptfs, under heavy memory pressure.

If I understand the issue correctly,
fs/ecryptfs/crypto.c:ecryptfs_calculate_md5:
- Locks &crypt_stat->cs_hash_tfm_mutex
-> Calls crypto_alloc_hash
-> Calls crypto_alloc_base
-> __crypto_alloc_tfm
-> kzalloc(tfm_size, GFP_KERNEL)
-> Which, under memory pressure, may attempt to flush
pages, which would reenter ecryptfs_calculate_md5, try to lock the
mutex again, and cause a deadlock.

This is a 3.18-based kernel, and this upstream commit:
3095e8e366b471f3bcdbf21c9c72a45718ff8756 "eCryptfs: Use skcipher and
shash" gets rid of the call to crypto_alloc_hash, and instead calls
crypto_alloc_shash, but it looks like there is a similar issue there
(another kmalloc with GFP_KERNEL).

I wonder if we should release the mutex during the alloc call, or if
there is another better way to fix this (this should be a fairly
common filesystem issue, I imagine?).

Suggestions and patches welcome.

Best,

Nicolas

[ 1291.295366] =================================
[ 1291.299679] [ INFO: inconsistent lock state ]
[ 1291.303997] 3.18.0 #234 Not tainted
[ 1291.307449] ---------------------------------
[ 1291.311764] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
[ 1291.318234] kswapd0/62 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 1291.323238] (&crypt_stat->cs_hash_tfm_mutex){+.+.?.}, at:
[<ffffffc00041bf90>] ecryptfs_calculate_md5+0x58/0x170
[ 1291.333437] {RECLAIM_FS-ON-W} state was registered at:
[ 1291.338526] [<ffffffc00026da9c>] mark_lock+0x2dc/0x6a8
[ 1291.343796] [<ffffffc00026df04>] mark_held_locks+0x9c/0xc4
[ 1291.349407] [<ffffffc000270de4>] lockdep_trace_alloc+0xc4/0xd8
[ 1291.355366] [<ffffffc000338bcc>] __kmalloc+0x90/0x278
[ 1291.360551] [<ffffffc000442fc0>] __crypto_alloc_tfm+0x98/0x184
[ 1291.366507] [<ffffffc000443108>] crypto_alloc_base+0x5c/0xa8
[ 1291.372289] [<ffffffc00041bfbc>] ecryptfs_calculate_md5+0x84/0x170
[ 1291.378590] [<ffffffc00041d368>] ecryptfs_compute_root_iv+0xbc/0x13c
[ 1291.385063] [<ffffffc00041d558>] ecryptfs_new_file_context+0x170/0x1f8
[ 1291.391709] [<ffffffc000419dac>] ecryptfs_initialize_file+0x74/0x108
[ 1291.398181] [<ffffffc000419f58>] ecryptfs_create+0x118/0x17c
[ 1291.403964] [<ffffffc00035082c>] vfs_create+0x98/0xe4
[ 1291.409144] [<ffffffc000350dd4>] do_last.isra.30+0x55c/0xb04
[ 1291.414928] [<ffffffc000351938>] path_openat+0x5bc/0x60c
[ 1291.420368] [<ffffffc000352954>] do_filp_open+0x4c/0xb8
[ 1291.425722] [<ffffffc0003416ec>] do_sys_open+0x180/0x2e8
[ 1291.431161] [<ffffffc000392738>] compat_SyS_open+0x34/0x48
[ 1291.436773] [<ffffffc000205470>] el0_svc_naked+0x20/0x28
[ 1291.442214] irq event stamp: 4500643
[ 1291.445753] hardirqs last enabled at (4500643):
[<ffffffc0002fa398>] get_page_from_freelist+0x400/0x5f0
[ 1291.455158] hardirqs last disabled at (4500642):
[<ffffffc0002fa104>] get_page_from_freelist+0x16c/0x5f0
[ 1291.464560] softirqs last enabled at (4498536):
[<ffffffc000228560>] __do_softirq+0x38c/0x428
[ 1291.473103] softirqs last disabled at (4498531):
[<ffffffc000228900>] irq_exit+0x7c/0xd8
[ 1291.481127]
[ 1291.481127] other info that might help us debug this:
[ 1291.487596] Possible unsafe locking scenario:
[ 1291.487596]
[ 1291.493462] CPU0
[ 1291.495880] ----
[ 1291.498298] lock(&crypt_stat->cs_hash_tfm_mutex);
[ 1291.503137] <Interrupt>
[ 1291.505727] lock(&crypt_stat->cs_hash_tfm_mutex);
[ 1291.510737]
[ 1291.510737] *** DEADLOCK ***
[ 1291.510737]
[ 1291.516603] no locks held by kswapd0/62.
[ 1291.520486]
[ 1291.520486] stack backtrace:
[ 1291.524803] CPU: 2 PID: 62 Comm: kswapd0 Not tainted 3.18.0 #234
[ 1291.535932] Call trace:
[ 1291.538353] [<ffffffc000209ed8>] dump_backtrace+0x0/0x140
[ 1291.543703] [<ffffffc00020a034>] show_stack+0x1c/0x28
[ 1291.548709] [<ffffffc0008a45c8>] dump_stack+0x80/0xc4
[ 1291.553713] [<ffffffc0008a36c4>] print_usage_bug.part.35+0x270/0x28c
[ 1291.560010] [<ffffffc00026dc0c>] mark_lock+0x44c/0x6a8
[ 1291.565100] [<ffffffc00026ecf0>] __lock_acquire+0x920/0x19a8
[ 1291.570707] [<ffffffc0002705f4>] lock_acquire+0x138/0x188
[ 1291.576057] [<ffffffc0008a7df0>] mutex_lock_nested+0x8c/0x3fc
[ 1291.581751] [<ffffffc00041bf8c>] ecryptfs_calculate_md5+0x54/0x170
[ 1291.587876] [<ffffffc00041c890>] ecryptfs_derive_iv+0xec/0x184
[ 1291.593656] [<ffffffc00041c99c>] crypt_extent+0x74/0x1b0
[ 1291.598919] [<ffffffc00041cebc>] ecryptfs_encrypt_page+0xdc/0x1c8
[ 1291.604958] [<ffffffc00041b174>] ecryptfs_writepage+0x1c/0x7c
[ 1291.610653] [<ffffffc0003053cc>] shrink_page_list+0x578/0x978
[ 1291.616346] [<ffffffc000305ddc>] shrink_inactive_list+0x264/0x4e8
[ 1291.622384] [<ffffffc00030679c>] shrink_lruvec+0x3ec/0x5a4
[ 1291.627820] [<ffffffc0003069ac>] shrink_zone+0x58/0x118
[ 1291.632997] [<ffffffc000307310>] balance_pgdat+0x300/0x550
[ 1291.638432] [<ffffffc0003079d4>] kswapd+0x474/0x51c
[ 1291.643264] [<ffffffc0002437c0>] kthread+0xf0/0xfc


2016-04-16 07:01:24

by Herbert Xu

[permalink] [raw]
Subject: Re: INFO: inconsistent lock state in ecryptfs_calculate_md5

On Thu, Apr 14, 2016 at 10:03:48PM +0800, Nicolas Boichat wrote:
>
> We've run into the following lockdep warning while using a system with
> ecryptfs, under heavy memory pressure.
>
> If I understand the issue correctly,
> fs/ecryptfs/crypto.c:ecryptfs_calculate_md5:
> - Locks &crypt_stat->cs_hash_tfm_mutex
> -> Calls crypto_alloc_hash
> -> Calls crypto_alloc_base
> -> __crypto_alloc_tfm
> -> kzalloc(tfm_size, GFP_KERNEL)
> -> Which, under memory pressure, may attempt to flush
> pages, which would reenter ecryptfs_calculate_md5, try to lock the
> mutex again, and cause a deadlock.

Well the following patch should fix this bug but it's really
just one piece of a much bigger problem. For example, the function
ecryptfs_encrypt_page also does a GFP_USER allocation. So to
completely fix this filesystem someone should audit all the data
paths to ensure that no FS allocations occur.

---8<---
Subject: eCryptfs: Do not allocate hash tfm in NORECLAIM context

You cannot allocate crypto tfm objects in NORECLAIM or NOFS contexts.
The ecryptfs code currently does exactly that for the MD5 tfm.

This patch fixes it by preallocating the MD5 tfm in a safe context.

The MD5 tfm is also reentrant so this patch removes the superfluous
cs_hash_tfm_mutex.

Reported-by: Nicolas Boichat <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 64026e5..5600496 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -105,19 +105,7 @@ static int ecryptfs_calculate_md5(char *dst,
struct crypto_shash *tfm;
int rc = 0;

- mutex_lock(&crypt_stat->cs_hash_tfm_mutex);
tfm = crypt_stat->hash_tfm;
- if (!tfm) {
- tfm = crypto_alloc_shash(ECRYPTFS_DEFAULT_HASH, 0, 0);
- if (IS_ERR(tfm)) {
- rc = PTR_ERR(tfm);
- ecryptfs_printk(KERN_ERR, "Error attempting to "
- "allocate crypto context; rc = [%d]\n",
- rc);
- goto out;
- }
- crypt_stat->hash_tfm = tfm;
- }
rc = ecryptfs_hash_digest(tfm, src, len, dst);
if (rc) {
printk(KERN_ERR
@@ -126,7 +114,6 @@ static int ecryptfs_calculate_md5(char *dst,
goto out;
}
out:
- mutex_unlock(&crypt_stat->cs_hash_tfm_mutex);
return rc;
}

@@ -207,16 +194,29 @@ out:
*
* Initialize the crypt_stat structure.
*/
-void
-ecryptfs_init_crypt_stat(struct ecryptfs_crypt_stat *crypt_stat)
+int ecryptfs_init_crypt_stat(struct ecryptfs_crypt_stat *crypt_stat)
{
+ struct crypto_shash *tfm;
+ int rc;
+
+ tfm = crypto_alloc_shash(ECRYPTFS_DEFAULT_HASH, 0, 0);
+ if (IS_ERR(tfm)) {
+ rc = PTR_ERR(tfm);
+ ecryptfs_printk(KERN_ERR, "Error attempting to "
+ "allocate crypto context; rc = [%d]\n",
+ rc);
+ return rc;
+ }
+
memset((void *)crypt_stat, 0, sizeof(struct ecryptfs_crypt_stat));
INIT_LIST_HEAD(&crypt_stat->keysig_list);
mutex_init(&crypt_stat->keysig_list_mutex);
mutex_init(&crypt_stat->cs_mutex);
mutex_init(&crypt_stat->cs_tfm_mutex);
- mutex_init(&crypt_stat->cs_hash_tfm_mutex);
+ crypt_stat->hash_tfm = tfm;
crypt_stat->flags |= ECRYPTFS_STRUCT_INITIALIZED;
+
+ return 0;
}

/**
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index d123fba..c7761a9 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -242,7 +242,6 @@ struct ecryptfs_crypt_stat {
struct list_head keysig_list;
struct mutex keysig_list_mutex;
struct mutex cs_tfm_mutex;
- struct mutex cs_hash_tfm_mutex;
struct mutex cs_mutex;
};

@@ -577,7 +576,7 @@ int virt_to_scatterlist(const void *addr, int size, struct scatterlist *sg,
int sg_size);
int ecryptfs_compute_root_iv(struct ecryptfs_crypt_stat *crypt_stat);
void ecryptfs_rotate_iv(unsigned char *iv);
-void ecryptfs_init_crypt_stat(struct ecryptfs_crypt_stat *crypt_stat);
+int ecryptfs_init_crypt_stat(struct ecryptfs_crypt_stat *crypt_stat);
void ecryptfs_destroy_crypt_stat(struct ecryptfs_crypt_stat *crypt_stat);
void ecryptfs_destroy_mount_crypt_stat(
struct ecryptfs_mount_crypt_stat *mount_crypt_stat);
diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 121114e..0caec70 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -898,8 +898,11 @@ static int ecryptfs_setattr(struct dentry *dentry, struct iattr *ia)
struct ecryptfs_crypt_stat *crypt_stat;

crypt_stat = &ecryptfs_inode_to_private(d_inode(dentry))->crypt_stat;
- if (!(crypt_stat->flags & ECRYPTFS_STRUCT_INITIALIZED))
- ecryptfs_init_crypt_stat(crypt_stat);
+ if (!(crypt_stat->flags & ECRYPTFS_STRUCT_INITIALIZED)) {
+ rc = ecryptfs_init_crypt_stat(crypt_stat);
+ if (rc)
+ return rc;
+ }
inode = d_inode(dentry);
lower_inode = ecryptfs_inode_to_lower(inode);
lower_dentry = ecryptfs_dentry_to_lower(dentry);
diff --git a/fs/ecryptfs/super.c b/fs/ecryptfs/super.c
index 77a486d..85411ce 100644
--- a/fs/ecryptfs/super.c
+++ b/fs/ecryptfs/super.c
@@ -55,7 +55,10 @@ static struct inode *ecryptfs_alloc_inode(struct super_block *sb)
inode_info = kmem_cache_alloc(ecryptfs_inode_info_cache, GFP_KERNEL);
if (unlikely(!inode_info))
goto out;
- ecryptfs_init_crypt_stat(&inode_info->crypt_stat);
+ if (ecryptfs_init_crypt_stat(&inode_info->crypt_stat)) {
+ kmem_cache_free(ecryptfs_inode_info_cache, inode_info);
+ goto out;
+ }
mutex_init(&inode_info->lower_file_mutex);
atomic_set(&inode_info->lower_file_count, 0);
inode_info->lower_file = NULL;
--
Email: Herbert Xu <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt