[CUDA] Make touching a kernel from a __host__ __device__ function a deferred error.

Previously, this was an immediate, don't pass go, don't collect $200
error.  But this precludes us from writing code like

  __host__ __device__ void launch_kernel() {
    kernel<<<...>>>();
  }

Such code isn't wrong, following our notions of right and wrong in CUDA,
unless it's codegen'ed.

llvm-svn: 283963
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index 9e101d1..2a66124 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -120,8 +120,7 @@
   // (a) Can't call global from some contexts until we support CUDA's
   // dynamic parallelism.
   if (CalleeTarget == CFT_Global &&
-      (CallerTarget == CFT_Global || CallerTarget == CFT_Device ||
-       (CallerTarget == CFT_HostDevice && getLangOpts().CUDAIsDevice)))
+      (CallerTarget == CFT_Global || CallerTarget == CFT_Device))
     return CFP_Never;
 
   // (b) Calling HostDevice is OK for everyone.