2021-03-29 16:24:40

by Olga Kornievskaia

[permalink] [raw]
Subject: how should nconnect work with failed connections

Hi folks,

I'm looking for what kinds of errors and how should the client be able
to recover from them when it comes to using the nconnect feature.

My first question: what kind of failures seem reasonable? Say the
client has 3 connections to the server (nconnect=3). Is it possible
that one of those functions breaks? For instance, I simulate this with
blocking any requests on that connection and sending an RST back to
the client. Besides an RST, are there any other interesting errors
that should be considered (ICMP port unreachable perhaps. In this case
the client keeps trying to send the same packet it sent before. In
RST, client tries to send a SYN)?

If the client had sent a request on that particular connection but
didn't get a response and now the connection is being RSTed. Should
the client 'give up' and send the same request on a different
connection? For v3,v4 I can see that it's a problematic idea because
the reply cache is based on the port.

Current client behaviour is that: client does not retry. Am I correct
in assuming that it shouldn't be changed? I'm assuming that an
application that issued that operation will hang (as in a single TCP
connection case). Once that transport is occupied, the other
transports are still available to send requests over to the server
un-obstructed.

My next question: should there be any connection badness detection? As
this transport is not marked in anyway and can be selected by another
RPC (if the previous request was ctrl-C-ed, releasing the transport
into the transport pool).

Thank you.