Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754074AbdHXVXn (ORCPT ); Thu, 24 Aug 2017 17:23:43 -0400 Received: from mail-wm0-f46.google.com ([74.125.82.46]:38005 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753888AbdHXVXj (ORCPT ); Thu, 24 Aug 2017 17:23:39 -0400 From: Philipp Reisner To: Jens Axboe , linux-kernel@vger.kernel.org Cc: drbd-dev@lists.linbit.com Subject: [PATCH 08/17] drbd: fix potential get_ldev/put_ldev refcount imbalance during attach Date: Thu, 24 Aug 2017 23:23:05 +0200 Message-Id: <1503609794-13233-9-git-send-email-philipp.reisner@linbit.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1503609794-13233-1-git-send-email-philipp.reisner@linbit.com> References: <1503609794-13233-1-git-send-email-philipp.reisner@linbit.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2472 Lines: 66 From: Lars Ellenberg Race: drbd_adm_attach() | async drbd_md_endio() | device->ldev is still NULL. | | drbd_md_read( | .endio = drbd_md_endio; | submit; | .... | wait for done == 1; | done = 1; ); | wake_up(); .. lot of other stuff, | .. includeing taking and | ...giving up locks, | .. doing further IO, | .. stuff that takes "some time" | | while in this context, | this is the next statement. | which means this context was scheduled .. only then, finally, | away for "some time". device->ldev = nbc; | | if (device->ldev) | put_ldev() Unlikely, but possible. I was able to provoke it "reliably" by adding an mdelay(500); after the wake_up(). Fixed by moving the if (!NULL) put_ldev() before done = 1; Impact of the bug was that the resulting refcount imbalance could lead to premature destruction of the object, potentially causing a NULL pointer dereference during a subsequent detach. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c index e48012d..f0717a9 100644 --- a/drivers/block/drbd/drbd_worker.c +++ b/drivers/block/drbd/drbd_worker.c @@ -65,6 +65,11 @@ void drbd_md_endio(struct bio *bio) device = bio->bi_private; device->md_io.error = blk_status_to_errno(bio->bi_status); + /* special case: drbd_md_read() during drbd_adm_attach() */ + if (device->ldev) + put_ldev(device); + bio_put(bio); + /* We grabbed an extra reference in _drbd_md_sync_page_io() to be able * to timeout on the lower level device, and eventually detach from it. * If this io completion runs after that timeout expired, this @@ -79,9 +84,6 @@ void drbd_md_endio(struct bio *bio) drbd_md_put_buffer(device); device->md_io.done = 1; wake_up(&device->misc_wait); - bio_put(bio); - if (device->ldev) /* special case: drbd_md_read() during drbd_adm_attach() */ - put_ldev(device); } /* reads on behalf of the partner, -- 2.7.4