Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp816000imm; Wed, 4 Jul 2018 06:31:24 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdZpqMlUwIXLZ2P4vLVKJNbAE2teiyskvHVbojEnXExvXFPl9Xt6l7aV1Tl4Fr8ong8vvhJ X-Received: by 2002:a62:234a:: with SMTP id j71-v6mr2155696pfj.221.1530711084128; Wed, 04 Jul 2018 06:31:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530711084; cv=none; d=google.com; s=arc-20160816; b=jlEnWDXA0SUOZOXytFOFf2UXrX+nFK7OYgGzNvJqAeAJ/xFg2euRKtuxiVEE7mHk6x 3jsULSk+0S0Bai6hX6kDpCAMcEOTnum2ZtXzKeWUimZjg3lT6VrlKWA9MXgZAfe4qern R9Wr/bd3Jkwu0ron6T3PSAwfI7UCtLu+Pxkrkg4Y1s7RP0Dpq7XcafAnvN5Q5PzQsHq7 X4QDqD5Nd0DE+SgWNiKM2YKfgzXscq4SgvlbqgEJK1mZkFc0CvmjH6c5yWtwoUI4BP4R AzAMbLP2SG1byV2AHLnYJ7gz17cGoUR0tTAsDUF0uGnN1NWqk7cZ+fczwG1ltqd+rL+e 0dDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:message-id:date :content-transfer-encoding:content-id:mime-version:subject:cc :references:in-reply-to:from:organization:arc-authentication-results; bh=XPrM5w8gsTxCnrPCnYWKXaUVZ5g4DYfvsyxRtbCQS7U=; b=t3JZ43xuQejjZr3uB9+jKZy9M7bTfXzBLpsKTvyJX9/QaaLN4/VKZOxULUpwa3J2X0 WsRE/GzsYm/VMHuHMX7tYv4TTsZCz9rUYqLiUoLqRQlhgPLChpsct+H7Fe9O0cmonynM UCAKRr81d9c0/ahHhRL1vJIcjgCWsK9F0bwtXzVfvCSWLLZHptsrv9OeRNJYPp6avlsD 2y+9LFDlRtaK8O8iIGT2Hqu+aoCL9JO6N6PdxdyHRbtI7DeL2Qzg0qTMjdg73AQd5ZbL LMlWkoW202OMg0bFZEftqN29xuM/p18GhhvJhvRbPjtVkZx2GgB9eljT+xj08ZsMNf8F Mn8Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c13-v6si3358159pga.413.2018.07.04.06.31.09; Wed, 04 Jul 2018 06:31:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752997AbeGDNaa convert rfc822-to-8bit (ORCPT + 99 others); Wed, 4 Jul 2018 09:30:30 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46578 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752634AbeGDNa3 (ORCPT ); Wed, 4 Jul 2018 09:30:29 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EC58A4077B98; Wed, 4 Jul 2018 13:30:28 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-123-203.rdu2.redhat.com [10.10.123.203]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8792D2026D76; Wed, 4 Jul 2018 13:30:27 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <24367.1530707373@warthog.procyon.org.uk> References: <24367.1530707373@warthog.procyon.org.uk> <24655.1530695695@warthog.procyon.org.uk> <877emb2740.fsf@notabene.neil.brown.name> <20180222073330.36259-1-carmark.dlut@gmail.com> Cc: dhowells@redhat.com, NeilBrown , Andrew Morton , Anthony DeRobertis , linux-cachefs@redhat.com, linux-kernel@vger.kernel.org, Lei Xue , Vegard Nossum , Daniel Axtens , KiranKumar Modukuri Subject: Re: [PATCH] cachefiles: Fix assertion "6 == 5 is false" at fs/fscache/operation.c:494 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <28500.1530711027.1@warthog.procyon.org.uk> Content-Transfer-Encoding: 8BIT Date: Wed, 04 Jul 2018 14:30:27 +0100 Message-ID: <28501.1530711027@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 04 Jul 2018 13:30:29 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Wed, 04 Jul 2018 13:30:29 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'dhowells@redhat.com' RCPT:'' To: unlisted-recipients:; (no To-header on input) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Howells wrote: > So something like the attached. Possibly the changes to operation.c should be > split into a separate patch. Bah. Helps if I try committing all of my changes first. Revised patch attached. David. --- commit a0d29054c1c7bf575d73cced446dec6fcba30e0d Author: kiran modukuri Date: Tue Jul 18 16:25:49 2017 -0700 cachefiles: Fix assertion "6 == 5 is false" at fs/fscache/operation.c:494 There is a potential race in fscache operation enqueuing for reading and copying multiple pages from cachefiles to netfs. Under some heavy load system, it will happen very often. If this race occurs, an oops similar to the following is seen: kernel BUG at fs/fscache/operation.c:69! invalid opcode: 0000 [#1] SMP ... #0 [ffff883fff0838d8] machine_kexec at ffffffff81051beb #1 [ffff883fff083938] crash_kexec at ffffffff810f2542 #2 [ffff883fff083a08] oops_end at ffffffff8163e1a8 #3 [ffff883fff083a30] die at ffffffff8101859b #4 [ffff883fff083a60] do_trap at ffffffff8163d860 #5 [ffff883fff083ab0] do_invalid_op at ffffffff81015204 #6 [ffff883fff083b60] invalid_op at ffffffff8164701e [exception RIP: fscache_enqueue_operation+246] RIP: ffffffffa0b793c6 RSP: ffff883fff083c18 RFLAGS: 00010046 RAX: 0000000000000019 RBX: ffff8832ed1a9ec0 RCX: 0000000000000006 RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000046 RBP: ffff883fff083c20 R8: 0000000000000086 R9: 000000000000178f R10: ffffffff816aeb00 R11: ffff883fff08392e R12: ffff8802f0525620 R13: ffff88407ffc01d8 R14: 0000000000000000 R15: 0000000000000003 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6 #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48 #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028 Reported-by: Lei Xue Reported-by: Vegard Nossum Reported-by: Anthony DeRobertis Reported-by: NeilBrown Reported-by: Daniel Axtens Reported-by: KiranKumar Modukuri Signed-off-by: David Howells diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 5082c8a49686..40f7595aad10 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -27,6 +27,7 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode, struct cachefiles_one_read *monitor = container_of(wait, struct cachefiles_one_read, monitor); struct cachefiles_object *object; + struct fscache_retrieval *op = monitor->op; struct wait_bit_key *key = _key; struct page *page = wait->private; @@ -51,16 +52,22 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode, list_del(&wait->entry); /* move onto the action list and queue for FS-Cache thread pool */ - ASSERT(monitor->op); + ASSERT(op); - object = container_of(monitor->op->op.object, - struct cachefiles_object, fscache); + /* We need to temporarily bump the usage count as we don't own a ref + * here otherwise cachefiles_read_copier() may free the op between the + * monitor being enqueued on the op->to_do list and the op getting + * enqueued on the work queue. + */ + fscache_get_retrieval(op); + object = container_of(op->op.object, struct cachefiles_object, fscache); spin_lock(&object->work_lock); - list_add_tail(&monitor->op_link, &monitor->op->to_do); + list_add_tail(&monitor->op_link, &op->to_do); spin_unlock(&object->work_lock); - fscache_enqueue_retrieval(monitor->op); + fscache_enqueue_retrieval(op); + fscache_put_retrieval(op); return 0; } diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c index e30c5975ea58..8d265790374c 100644 --- a/fs/fscache/operation.c +++ b/fs/fscache/operation.c @@ -70,7 +70,8 @@ void fscache_enqueue_operation(struct fscache_operation *op) ASSERT(op->processor != NULL); ASSERT(fscache_object_is_available(op->object)); ASSERTCMP(atomic_read(&op->usage), >, 0); - ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); + ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS, + op->state, ==, FSCACHE_OP_ST_CANCELLED); fscache_stat(&fscache_n_op_enqueue); switch (op->flags & FSCACHE_OP_TYPE) { @@ -499,7 +500,8 @@ void fscache_put_operation(struct fscache_operation *op) struct fscache_cache *cache; _enter("{OBJ%x OP%x,%d}", - op->object->debug_id, op->debug_id, atomic_read(&op->usage)); + op->object ? op->object->debug_id : 0, + op->debug_id, atomic_read(&op->usage)); ASSERTCMP(atomic_read(&op->usage), >, 0);