Received: by 10.192.165.148 with SMTP id m20csp1684188imm; Thu, 26 Apr 2018 00:14:52 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/UzRQAyF5lJwy++cibx88BuMGttJq1rHUdVob4CCnDK1lMaE4zRInWuhl4Eyvx+Sb812vb X-Received: by 2002:a17:902:ac1:: with SMTP id 59-v6mr32758142plp.367.1524726892811; Thu, 26 Apr 2018 00:14:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524726892; cv=none; d=google.com; s=arc-20160816; b=AKUIHF+I+HvaO/N0NHMy8x21iOAkdEYP1WxDz5GAvb0+jX4bY5x3CrZFtPXmVtsfTB 8a7mGRkDtJFIKlEnO4wSkhLYdnKMpgAny2iJ2CiD1h78RH64BRQpDa/Hr1Z5bAaKFM6c 2mHI7mDzvhrLFPJgP0u7E+qIrMq9aftvkCuYWp775YX8M5EFQXm5zM7X+HosqYUjSQU8 YweldxeDHGhkJZA6MVa3lKCb+kXN0e1O647FOOPnjh/3+sMifbX/stcMMrO0VAKJULG1 Fm++vUmgYYIzPIVIIOuhXR5H0a8kI7u9T/SQX8ineR0Db2V+iZTuREWU/fLlt3wxPJ9x 1S7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=g7gcYQHCbCYXZAZ21+m1wROFwckndLY5pnV5qg7Wq4o=; b=M4dbAVL0fuiV4LBuc94ma03LsahNFjjPm5BLJ4+H6lfyVL0s1V2Jr16QCH1BrIsJC5 LBql5luPVWpHhSgbteeYUBPC710eFPSwlrADJZm/XAYswMRUIJ7EsZLnRE0oqE25UTEw 10ECD6hwS+BMlvhl6JZvTZBrKrH90+uHLKsrNfey6Dqx9jaAUDI0fCZhXostuvvsp1nd 9KLK3JQVHL6CY7TZaI0mrLzaMeSAD+YtzKuB158BX1tr5DggMnRr6xGGJebDVV5bUiv7 E0MOrS6eOd6pWTsAvN6Xk8t/gjhoYTY/ODFudvWJ1VgkUE/2MCLD2N7vbV4MjdROtRpC IJ+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b2si15407821pgc.275.2018.04.26.00.14.38; Thu, 26 Apr 2018 00:14:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753997AbeDZHN1 (ORCPT + 99 others); Thu, 26 Apr 2018 03:13:27 -0400 Received: from victor.provo.novell.com ([137.65.250.26]:49549 "EHLO prv3-mh.provo.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753068AbeDZHNZ (ORCPT ); Thu, 26 Apr 2018 03:13:25 -0400 Received: from ghe-pc.suse.asia (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by prv3-mh.provo.novell.com with ESMTP (TLS encrypted); Thu, 26 Apr 2018 01:13:13 -0600 From: Gang He To: teigland@redhat.com Cc: Gang He , cluster-devel@redhat.com, linux-kernel@vger.kernel.org Subject: [PATCH] dlm: make sctp_connect_to_sock() return in specified time Date: Thu, 26 Apr 2018 15:14:36 +0800 Message-Id: <1524726876-4770-1-git-send-email-ghe@suse.com> X-Mailer: git-send-email 1.8.5.6 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When the user setup a two-ring cluster, DLM kernel module will automatically selects to use SCTP protocol to communicate between each node. There will be about 5 minute hang in DLM kernel module, in case one ring is broken before switching to another ring, this will potentially affect the dependent upper applications, e.g. ocfs2, gfs2, clvm and clustered-MD, etc. Unfortunately, if the user setup a two-ring cluster, we can not specify DLM communication protocol with TCP explicitly, since DLM kernel module only supports SCTP protocol for multiple ring cluster. Base on my investigation, the time is spent in sock->ops->connect() function before returns ETIMEDOUT(-110) error, since O_NONBLOCK argument in connect() function does not work here, then we should make sock->ops->connect() function return in specified time via setting socket SO_SNDTIMEO atrribute. Signed-off-by: Gang He --- fs/dlm/lowcomms.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 5243989..b786acc 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1037,6 +1037,7 @@ static void sctp_connect_to_sock(struct connection *con) int result; int addr_len; struct socket *sock; + struct timeval tv = { .tv_sec = 5, .tv_usec = 0 }; if (con->nodeid == 0) { log_print("attempt to connect sock 0 foiled"); @@ -1083,8 +1084,19 @@ static void sctp_connect_to_sock(struct connection *con) kernel_setsockopt(sock, SOL_TCP, TCP_NODELAY, (char *)&one, sizeof(one)); + /* + * Make sock->ops->connect() function return in specified time, + * since O_NONBLOCK argument in connect() function does not work here, + * then, we should restore the default value of this attribute. + */ + kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv, + sizeof(tv)); result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len, O_NONBLOCK); + memset(&tv, 0, sizeof(tv)); + kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv, + sizeof(tv)); + if (result == -EINPROGRESS) result = 0; if (result == 0) -- 1.8.5.6