Fix private DNS not working due to getConnectCounter() stuck
This call is not necessary to be protected by DnsTlsDispatcher::sLock.
The lock aims at protecting the creation/deletion of DnsTlsTransport.
Since the call is used only for the metrics, this change doesn't impact
on the functionality of private DNS.
Bug description:
When a thread gets stuck in TLS handshake, it might lead to another
thread blocking in the call getConnectCounter(). This results in
other DNS requests get stuck in the call getOrderedServerList()
because they are awaiting a lock hold by the second thread which
is also awaiting another lock hold by the first thread.
An example scenario is: In DNS strict mode, a private DNS server
used to be available in mobile data but becomes unresponsive.
Then, a new DNS request triggers the DnsResolver to do handshaking.
Before the handshake finishes, wifi can't work.
How to reproduce the bug:
1) Set DNS strict mode, turn on mobile data
2) Wait 20s for TLS connection disconnected. It can be check by:
- ps -AT $(pidof netd) | grep "TlsListen"
3) Drop DoT traffic by the command:
- iptables -A OUTPUT -p tcp --dport 853 -o rmnet_data1 -j DROP
4) Turn on wifi. Wifi shows no internet
Bug: 160027328
Test: not reproducible by the above steps
Test: cd packages/modules/DnsResolver && atest
Change-Id: I050ce8f13c19f706d58bac44c0e5a269481cb0c0
diff --git a/DnsTlsDispatcher.cpp b/DnsTlsDispatcher.cpp
index 1fa4001..df8fce8 100644
--- a/DnsTlsDispatcher.cpp
+++ b/DnsTlsDispatcher.cpp
@@ -146,8 +146,6 @@
DnsTlsTransport::Response DnsTlsDispatcher::query(const DnsTlsServer& server, unsigned mark,
const Slice query, const Slice ans, int* resplen,
bool* connectTriggered) {
- int connectCounter;
-
// TODO: This can cause the resolver to create multiple connections to the same DoT server
// merely due to different mark, such as the bit explicitlySelected unset.
// See if we can save them and just create one connection for one DoT server.
@@ -163,13 +161,19 @@
xport = it->second.get();
}
++xport->useCount;
- connectCounter = xport->transport.getConnectCounter();
}
+ // Don't call this function and hold sLock at the same time because of the following reason:
+ // TLS handshake requires a lock which is also needed by this function, if the handshake gets
+ // stuck, this function also gets blocked.
+ const int connectCounter = xport->transport.getConnectCounter();
+
LOG(DEBUG) << "Sending query of length " << query.size();
auto res = xport->transport.query(query);
LOG(DEBUG) << "Awaiting response";
const auto& result = res.get();
+ *connectTriggered = (xport->transport.getConnectCounter() > connectCounter);
+
DnsTlsTransport::Response code = result.code;
if (code == DnsTlsTransport::Response::success) {
if (result.response.size() > ans.size()) {
@@ -187,7 +191,6 @@
auto now = std::chrono::steady_clock::now();
{
std::lock_guard guard(sLock);
- *connectTriggered = (xport->transport.getConnectCounter() > connectCounter);
--xport->useCount;
xport->lastUsed = now;
cleanup(now);