[CUDA] Driver changes to support CUDA compilation on MacOS.
Summary:
Compiling CUDA device code requires us to know the host toolchain,
because CUDA device-side compiles pull in e.g. host headers.
When we only supported Linux compilation, this worked because
CudaToolChain, which is responsible for device-side CUDA compilation,
inherited from the Linux toolchain. But in order to support MacOS,
CudaToolChain needs to take a HostToolChain pointer.
Because a CUDA toolchain now requires a host TC, we no longer will
create a CUDA toolchain from Driver::getToolChain -- you have to go
through CreateOffloadingDeviceToolChains. I am *pretty* sure this is
correct, and that previously any attempt to create a CUDA toolchain
through getToolChain() would eventually have resulted in us throwing
"error: unsupported use of NVPTX for host compilation".
In any case hacking getToolChain to create a CUDA+host toolchain would
be wrong, because a Driver can be reused for multiple compilations,
potentially with different host TCs, and getToolChain will cache the
result, causing us to potentially use a stale host TC.
So that's the main change in this patch.
In addition, we have to pull CudaInstallationDetector out of Generic_GCC
and into a top-level class. It's now used by the Generic_GCC and MachO
toolchains.
Reviewers: tra
Subscribers: rryan, hfinkel, sfantao
Differential Revision: https://reviews.llvm.org/D26774
llvm-svn: 287285
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index ffecccc..e6e508d 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -473,14 +473,18 @@
if (llvm::any_of(Inputs, [](std::pair<types::ID, const llvm::opt::Arg *> &I) {
return types::isCuda(I.first);
})) {
- const ToolChain &TC = getToolChain(
- C.getInputArgs(),
- llvm::Triple(C.getSingleOffloadToolChain<Action::OFK_Host>()
- ->getTriple()
- .isArch64Bit()
- ? "nvptx64-nvidia-cuda"
- : "nvptx-nvidia-cuda"));
- C.addOffloadDeviceToolChain(&TC, Action::OFK_Cuda);
+ const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
+ const llvm::Triple &HostTriple = HostTC->getTriple();
+ llvm::Triple CudaTriple(HostTriple.isArch64Bit() ? "nvptx64-nvidia-cuda"
+ : "nvptx-nvidia-cuda");
+ // Use the CUDA and host triples as the key into the ToolChains map, because
+ // the device toolchain we create depends on both.
+ ToolChain *&CudaTC = ToolChains[CudaTriple.str() + "/" + HostTriple.str()];
+ if (!CudaTC) {
+ CudaTC = new toolchains::CudaToolChain(*this, CudaTriple, *HostTC,
+ C.getInputArgs());
+ }
+ C.addOffloadDeviceToolChain(CudaTC, Action::OFK_Cuda);
}
//
@@ -3717,9 +3721,6 @@
break;
}
break;
- case llvm::Triple::CUDA:
- TC = new toolchains::CudaToolChain(*this, Target, Args);
- break;
case llvm::Triple::PS4:
TC = new toolchains::PS4CPU(*this, Target, Args);
break;
@@ -3761,6 +3762,12 @@
}
}
}
+
+ // Intentionally omitted from the switch above: llvm::Triple::CUDA. CUDA
+ // compiles always need two toolchains, the CUDA toolchain and the host
+ // toolchain. So the only valid way to create a CUDA toolchain is via
+ // CreateOffloadingDeviceToolChains.
+
return *TC;
}