Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754035AbdHXVXj (ORCPT ); Thu, 24 Aug 2017 17:23:39 -0400 Received: from mail-wr0-f177.google.com ([209.85.128.177]:35976 "EHLO mail-wr0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753888AbdHXVXe (ORCPT ); Thu, 24 Aug 2017 17:23:34 -0400 From: Philipp Reisner To: Jens Axboe , linux-kernel@vger.kernel.org Cc: drbd-dev@lists.linbit.com Subject: [PATCH 04/17] drbd: Send P_NEG_ACK upon write error in protocol != C Date: Thu, 24 Aug 2017 23:23:01 +0200 Message-Id: <1503609794-13233-5-git-send-email-philipp.reisner@linbit.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1503609794-13233-1-git-send-email-philipp.reisner@linbit.com> References: <1503609794-13233-1-git-send-email-philipp.reisner@linbit.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1915 Lines: 48 From: Lars Ellenberg In protocol != C, we forgot to send the P_NEG_ACK for failing writes. Once we no longer submit to local disk, because we already "detached", due to the typical "on-io-error detach;" config setting, we already send the neg acks right away. Only those requests that have been submitted, and have been error-completed by the local disk, would forget to send the neg-ack, and only in asynchronous replication (protocol != C). Unless this happened during resync, where we already always send acks, regardless of protocol. The primary side needs the P_NEG_ACK in order to mark the affected block(s) for resync in its out-of-sync bitmap. If the blocks in question are not re-written again, we may miss to resync them later, causing data inconsistencies. This patch will always send the neg-acks, and also at least try to persist the out-of-sync status on the local node already. Signed-off-by: Philipp Reisner Signed-off-by: Lars Ellenberg diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c index 2745db2..72cb0bd 100644 --- a/drivers/block/drbd/drbd_worker.c +++ b/drivers/block/drbd/drbd_worker.c @@ -128,6 +128,14 @@ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(l block_id = peer_req->block_id; peer_req->flags &= ~EE_CALL_AL_COMPLETE_IO; + if (peer_req->flags & EE_WAS_ERROR) { + /* In protocol != C, we usually do not send write acks. + * In case of a write error, send the neg ack anyways. */ + if (!__test_and_set_bit(__EE_SEND_WRITE_ACK, &peer_req->flags)) + inc_unacked(device); + drbd_set_out_of_sync(device, peer_req->i.sector, peer_req->i.size); + } + spin_lock_irqsave(&device->resource->req_lock, flags); device->writ_cnt += peer_req->i.size >> 9; list_move_tail(&peer_req->w.list, &device->done_ee); -- 2.7.4