The RxRPC ACK packet supports selective ACK of up to 255 DATA packets. It
contains a variable length array with one octet allocated for each DATA
packet to be ACK'd. Each octet is either 0 or 1 depending on whether it is
a negative or positive ACK. 7 bits in each octet are effectively unused
and, further, there are three reserved octets following the ACK array that
are all set to 0.
To extend the ACK window up to 2048 ACKs, it is proposed[1]:
(1) that the ACKs for DATA packets first+0...first+254 in the Rx window
are in bit 0 of the octets in the array, ie. acks[0...254], pretty
much as now; and
(2) that if the ACK count is >=256, the first reserved byte after the ACK
table is annexed to the ACK table as acks[255] and contains the ACK
for packet first+255 in bit 0; and
(3) that if the ACK count is >256, horizontal striping be employed such
that the ACK for packet first+256 in the window is then in bit 1 of
acks[0], first+257 is in bit 1 of acks[1], up to first+511 being in
bit 1 of the borrowed reserved byte (ie. acks[255]).
first+512 is then in bit 2 of acks[0], going all the way up to
first+2048 being in bit 7 of acks[255].
If extended SACK is employed in an ACK packet, it should have EXTENDED-SACK
(0x08) set in the RxRPC packet header.
Alter rxrpc_input_ack() to sanity check the ACK count.
Alter rxrpc_input_ack() to limit the number of bytes it extracts from the
packet for the ack array to 256.
Alter rxrpc_input_soft_acks() to handle an extended SACK table.
Signed-off-by: David Howells <[email protected]>
Link: https://gerrit.openafs.org/#/c/14693/3 [1]
---
net/rxrpc/input.c | 21 ++++++++++++++-------
net/rxrpc/protocol.h | 7 +++++--
2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index dc201363f2c4..0a7f7462b617 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -767,15 +767,17 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
rxrpc_seq_t seq, int nr_acks,
struct rxrpc_ack_summary *summary)
{
- int ix;
- u8 annotation, anno_type;
+ int ix, i;
+ u8 annotation, anno_type, ack;
- for (; nr_acks > 0; nr_acks--, seq++) {
+ for (i = 0; i < nr_acks; i++, seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;
annotation = call->rxtx_annotations[ix];
anno_type = annotation & RXRPC_TX_ANNO_MASK;
annotation &= ~RXRPC_TX_ANNO_MASK;
- switch (*acks++) {
+ ack = acks[i % RXRPC_EXTENDED_SACK_SIZE];
+ ack >>= i / RXRPC_EXTENDED_SACK_SIZE;
+ switch (ack) {
case RXRPC_ACK_TYPE_ACK:
summary->nr_acks++;
if (anno_type == RXRPC_TX_ANNO_ACK)
@@ -846,7 +848,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
union {
struct rxrpc_ackpacket ack;
struct rxrpc_ackinfo info;
- u8 acks[RXRPC_MAXACKS];
+ u8 acks[RXRPC_EXTENDED_SACK_SIZE];
} buf;
rxrpc_serial_t ack_serial, acked_serial;
rxrpc_seq_t first_soft_ack, hard_ack, prev_pkt;
@@ -874,6 +876,10 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
first_soft_ack, prev_pkt,
summary.ack_reason, nr_acks);
+ if ((nr_acks > RXRPC_MAXACKS && !(sp->hdr.flags & RXRPC_EXTENDED_SACK)) ||
+ (nr_acks > RXRPC_MAXACKS_EXTENDED))
+ return rxrpc_proto_abort("AKC", call, 0);
+
switch (buf.ack.reason) {
case RXRPC_ACK_PING_RESPONSE:
rxrpc_input_ping_response(call, skb->tstamp, acked_serial,
@@ -912,7 +918,7 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
}
buf.info.rxMTU = 0;
- ioffset = offset + nr_acks + 3;
+ ioffset = offset + min(nr_acks, RXRPC_MAXACKS) + 3;
if (skb->len >= ioffset + sizeof(buf.info) &&
skb_copy_bits(skb, ioffset, &buf.info, sizeof(buf.info)) < 0)
return rxrpc_proto_abort("XAI", call, 0);
@@ -969,7 +975,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
}
if (nr_acks > 0) {
- if (skb_copy_bits(skb, offset, buf.acks, nr_acks) < 0) {
+ if (skb_copy_bits(skb, offset, buf.acks,
+ min_t(unsigned int, nr_acks, sizeof(buf.acks))) < 0) {
rxrpc_proto_abort("XSA", call, 0);
goto out;
}
diff --git a/net/rxrpc/protocol.h b/net/rxrpc/protocol.h
index 49bb972539aa..287986012cd9 100644
--- a/net/rxrpc/protocol.h
+++ b/net/rxrpc/protocol.h
@@ -51,7 +51,8 @@ struct rxrpc_wire_header {
#define RXRPC_CLIENT_INITIATED 0x01 /* signifies a packet generated by a client */
#define RXRPC_REQUEST_ACK 0x02 /* request an unconditional ACK of this packet */
#define RXRPC_LAST_PACKET 0x04 /* the last packet from this side for this call */
-#define RXRPC_MORE_PACKETS 0x08 /* more packets to come */
+#define RXRPC_MORE_PACKETS 0x08 /* [DATA] More packets to come */
+#define RXRPC_EXTENDED_SACK 0x08 /* [ACK] Extended SACK table */
#define RXRPC_JUMBO_PACKET 0x20 /* [DATA] this is a jumbo packet */
#define RXRPC_SLOW_START_OK 0x20 /* [ACK] slow start supported */
@@ -124,7 +125,9 @@ struct rxrpc_ackpacket {
#define RXRPC_ACK__INVALID 10 /* Representation of invalid ACK reason */
uint8_t nAcks; /* number of ACKs */
-#define RXRPC_MAXACKS 255
+#define RXRPC_MAXACKS 255 /* Normal maximum number of ACKs */
+#define RXRPC_EXTENDED_SACK_SIZE 256 /* Size of the extended SACK table */
+#define RXRPC_MAXACKS_EXTENDED 2048 /* Maximum number of ACKs in extended SACK table */
uint8_t acks[0]; /* list of ACK/NAKs */
#define RXRPC_ACK_TYPE_NACK 0
David,
Thanks for working on this.
For the benefit of other readers. The RxRPC protocol is not
standardized by the IETF
or any other standards organization. The best description of the
protocol is present in
the OpenAFS source repository (doc/txt/rx-spec.txt). Proposed
updates to this
specification are available in OpenAFS Gerrit
doc: rx-spec Update for accuracy with current Rx implementations
https://gerrit.openafs.org/#/c/14692
doc: rx-spec Document the Extended SACK Table protocol extension
https://gerrit.openafs.org/#/c/14693
On 8/6/2021 6:08 AM, David Howells ([email protected]) wrote:
> The RxRPC ACK packet supports selective ACK of up to 255 DATA packets. It
> contains a variable length array with one octet allocated for each DATA
> packet to be ACK'd. Each octet is either 0 or 1 depending on whether it is
> a negative or positive ACK. 7 bits in each octet are effectively unused
> and, further, there are three reserved octets following the ACK array that
> are all set to 0.
The proposed EXTENDED_SACK ACK packet format assigns usage for all three
of the
unused octets. The first is used to extend the SACK table from 255
octets to 256 octets.
This octet will be zero unless it is actively being used to represent a
positive
acknowledgement of a DATA packet.
The second unused octet is designated as the <Trailer Count> field.
The three
unused octets are the result of a math error in the mid-90s when IBM
extended
the ACK packet format with a series of 32-bit fields. Over time
additional fields
were added but there has never been a clear method for the ACK packet
receiver
to determine how many fields were sent. The EXTENDED_SACK proposal
addresses
this oversight with the <Trailer Count> field which should be set to the
number of
32-bit trailer fields. At present this value is 4, not 0. This field
will permit
additional trailers to be defined in the future.
The third unused octet is designated as the <Extra SACK Count> field.
This field
designates how many additional variable length SACK tables are present
following
the designated number of trailer fields. Each additional SACK table
can represent
up to 2048 additional DATA packets.
>
> To extend the ACK window up to 2048 ACKs, it is proposed[1]:
>
> (1) that the ACKs for DATA packets first+0...first+254 in the Rx window
> are in bit 0 of the octets in the array, ie. acks[0...254], pretty
> much as now; and
>
> (2) that if the ACK count is >=256, the first reserved byte after the ACK
> table is annexed to the ACK table as acks[255] and contains the ACK
> for packet first+255 in bit 0; and
>
> (3) that if the ACK count is >256, horizontal striping be employed such
> that the ACK for packet first+256 in the window is then in bit 1 of
> acks[0], first+257 is in bit 1 of acks[1], up to first+511 being in
> bit 1 of the borrowed reserved byte (ie. acks[255]).
>
> first+512 is then in bit 2 of acks[0], going all the way up to
> first+2048 being in bit 7 of acks[255].
I think this should say "first + 2047 in bit 7 of acks[255]."
> If extended SACK is employed in an ACK packet, it should have EXTENDED-SACK
> (0x08) set in the RxRPC packet header.
The EXTENDED_SACK format proposal also addresses a historical deficiency
in the
specification and usage of the <previousPacket> field.
https://gerrit.openafs.org/#/c/14692
adds version specific details to rx-spec.txt but in summary, the
<previousPacket> field
is unreliable at present and cannot be used in any meaningful way.
Various Rx
implementations (including OpenAFS) attempt to use it to filter
out-of-sequence
ACK packets but this is unreliable because the value sent by many Rx
peers can move
backwards, or represent a sequence number that was out of range, or even
not be a
sequence number at all. When the EXTENDED_SACK flag is set the
<previousPacket>
field is defined to be the largest DATA packet sequence number accepted
by the
sender of the ACK packet. As such the <previousPacket> field can
reliably be
used for two purposes:
1. <previousPacket> can be used to detect out-of-sequence ACK packets
2. <previousPacket> - <firstPacket> + 1 represents the maximum number
of SACK table bits that contain valid acknowledgements.
Regardless of
how many SACK tables are present in the ACK packet.
> Alter rxrpc_input_ack() to sanity check the ACK count.
>
> Alter rxrpc_input_ack() to limit the number of bytes it extracts from the
> packet for the ack array to 256.
>
> Alter rxrpc_input_soft_acks() to handle an extended SACK table.
EXTENDED_SACK is still a proposal awaiting review.
What has been agreed upon at this point in time is that the acks[] array
be interpreted
as bit fields. I suggest that the first patch to rxrpc implement the
equivalent change
to OpenAFS
rx: compare RX_ACK_TYPE_ACK as a bit-field
https://gerrit.openafs.org/#/c/14465/4
I also suggest that rxrpc be updated if necessary to always set
<previousPacket> as
described by the EXTENDED_SACK proposal regardless of whether or not the
EXTENDED_SACK proposal is implemented. Existing OpenAFS Rx peers expect
that behavior when receiving ACK packets even though they do not the field
according to those expectations.
Thanks again for taking the time to review and implement the proposal.
Jeffrey Altman