tipc: guarantee peer bearer id exchange after reboot
When a link endpoint is going down locally, e.g., because its interface
is being stopped, it will spontaneously send out a RESET message to
its peer, informing it about this fact. This saves the peer from
detecting the failure via probing, and hence gives both speedier and
less resource consuming failure detection on the peer side.
According to the link FSM, a receiver of a RESET message, ignoring the
reason for it, must now consider the sender ready to come back up, and
starts periodically sending out ACTIVATE messages to the peer in order
to re-establish the link. Also, according to the FSM, the receiver of
an ACTIVATE message can now go directly to state ESTABLISHED and start
sending regular traffic packets. This is a well-proven and robust FSM.
However, in the case of a reboot, there is a small possibilty that link
endpoint on the rebooted node may have been re-created with a new bearer
identity between the moment it sent its (pre-boot) RESET and the moment
it receives the ACTIVATE from the peer. The new bearer identity cannot
be known by the peer according to this scenario, since traffic headers
don't convey such information. This is a problem, because both endpoints
need to know the correct value of the peer's bearer id at any moment in
time in order to be able to produce correct link events for their users.
The only way to guarantee this is to enforce a full setup message
exchange (RESET + ACTIVATE) even after the reboot, since those messages
carry the bearer idientity in their header.
In this commit we do this by introducing and setting a "stopping" bit in
the header of the spontaneously generated RESET messages, informing the
peer that the sender will not be immediately ready to re-establish the
link. A receiver seeing this bit must act as if this were a locally
detected connectivity failure, and hence has to go through a full two-
way setup message exchange before any link can be re-established.
Although never reported, this problem seems to have always been around.
This protocol addition is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 7d2bb3e..8b98faf 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1140,11 +1140,17 @@
void tipc_link_build_reset_msg(struct tipc_link *l, struct sk_buff_head *xmitq)
{
int mtyp = RESET_MSG;
+ struct sk_buff *skb;
if (l->state == LINK_ESTABLISHING)
mtyp = ACTIVATE_MSG;
tipc_link_build_proto_msg(l, mtyp, 0, 0, 0, 0, xmitq);
+
+ /* Inform peer that this endpoint is going down if applicable */
+ skb = skb_peek_tail(xmitq);
+ if (skb && (l->state == LINK_RESET))
+ msg_set_peer_stopping(buf_msg(skb), 1);
}
/* tipc_link_build_nack_msg: prepare link nack message for transmission
@@ -1411,7 +1417,9 @@
l->priority = peers_prio;
/* ACTIVATE_MSG serves as PEER_RESET if link is already down */
- if ((mtyp == RESET_MSG) || !link_is_up(l))
+ if (msg_peer_stopping(hdr))
+ rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT);
+ else if ((mtyp == RESET_MSG) || !link_is_up(l))
rc = tipc_link_fsm_evt(l, LINK_PEER_RESET_EVT);
/* ACTIVATE_MSG takes up link if it was already locally reset */