Update how sample(matrix) calls are invoked in SkSL

This removes the kMixed type of SkSL::SampleMatrix. All analysis of FP
sampling due to parent-child relationships is tracked in flags on
GrFragmentProcessor now.

The sample strategy is tracked as follows:
- An FP marks itself as using the local coordinate builtin directly (automatically done for .fp code based on reference to sk_TransformedCoords2D[0]).
- This state propagates up the parent towards the root, marking FPs as using coordinates indirectly. We stop the propagation when we hit a parent FP that explicitly samples the child because it becomes the source of the child's coordinates.
   - If that parent references its local coordinates directly, that kicks off its own upwards propagation.
- Being sampled explicitly propagates down to all children, and effectively disables vertex-shader evaluation of transforms.
   - A variable matrix automatically marks this flag as well, since it's essentially a shortcut to (matrix expression) * coords.
- The matrix type also propagates down, but right now that's only for whether or not there's perspective.
   - This doesn't affect FS coord evaluation since each FP applies its action independently.
   - But for VS-promoted transforms, the child's varying may inherit perspective (or other more general matrix types) from the parent and switch from a float2 to a float3.
- A SampleMatrix no longer tracks a base or owner, GrFragmentProcessor exposes its parent FP. An FP's sample matrix is always owned by its immediate parent.
   - This means that you can have a hierarchy from root to leaf like: [uniform, none, none, uses local coords], and that leaf will have a SampleMatrix of kNone type. However, because of parent tracking, the coordinate generation can walk up to the root and detect the proper transform expression it needs to produce, and automatically de-duplicate across children.

Currently, all FP's that are explicitly sampled have a signature of (color, float2 coord). FP's that don't use local coords, or whose coords are promoted to a varying have a signature of (color).
   - In this case, the shader builder either updates args.fLocalCoords to point to the varying directly, or adds a float2 local to the function body that includes the perspective divide.

GrFragmentProcessor automatically pretends it has an identity coord transform if the FP is marked as referencing the local coord builtin. This allows these FPs to still be processed as part of GrGLSLGeometryProcessor::collectTransforms, but removes the need for FP implementations to declare an identity GrCoordTransform.
   - To test this theory, GrTextureEffect and GrSkSLFP no longer have coord transforms explicitly.
   - Later CLs can trivially remove them from a lot of the other effects.
   - The coord generation should not change because it detects in both cases that the coord transform matrices were identity.

GrGLSLGeometryProcessor's collectTransforms and emitTransformCode has been completely overhauled to recurse up an FP's parent pointers and collect the expressions that affect the result. It de-duplicates expressions between siblings, and is able to produce a single varying for the base local coord (either when there are no intervening transforms, or the root FP needs an explicit coordinate to start off with).


This also adds the fp_sample_chaining GM from Brian, with a few more configurations to fill out the cells.

Bug: skia:10396
Change-Id: I86acc0c34c9f29d6371b34370bee9a18c2acf1c1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/297868
Commit-Queue: Michael Ludwig <michaelludwig@google.com>
Reviewed-by: Brian Salomon <bsalomon@google.com>
Reviewed-by: Brian Osman <brianosman@google.com>
diff --git a/gm/fp_sample_chaining.cpp b/gm/fp_sample_chaining.cpp
new file mode 100644
index 0000000..aeff605
--- /dev/null
+++ b/gm/fp_sample_chaining.cpp
@@ -0,0 +1,278 @@
+/*
+ * Copyright 2019 Google LLC.
+ *
+ * Use of this source code is governed by a BSD-style license that can be
+ * found in the LICENSE file.
+ */
+
+#include "gm/gm.h"
+#include "include/core/SkFont.h"
+#include "src/gpu/GrBitmapTextureMaker.h"
+#include "src/gpu/GrContextPriv.h"
+#include "src/gpu/GrRenderTargetContextPriv.h"
+#include "src/gpu/glsl/GrGLSLFragmentShaderBuilder.h"
+#include "src/gpu/ops/GrFillRectOp.h"
+#include "tools/ToolUtils.h"
+
+// Samples child with a constant (literal) matrix
+// Scales along X
+class ConstantMatrixEffect : public GrFragmentProcessor {
+public:
+    static constexpr GrProcessor::ClassID CLASS_ID = (GrProcessor::ClassID) 3;
+
+    ConstantMatrixEffect(std::unique_ptr<GrFragmentProcessor> child)
+            : GrFragmentProcessor(CLASS_ID, kNone_OptimizationFlags) {
+        this->registerChild(std::move(child),
+                            SkSL::SampleMatrix::MakeConstUniform(
+                                "float3x3(float3(0.5, 0.0, 0.0), "
+                                        "float3(0.0, 1.0, 0.0), "
+                                        "float3(0.0, 0.0, 1.0))"));
+    }
+
+    const char* name() const override { return "ConstantMatrixEffect"; }
+    void onGetGLSLProcessorKey(const GrShaderCaps&, GrProcessorKeyBuilder*) const override {}
+    bool onIsEqual(const GrFragmentProcessor& that) const override { return this == &that; }
+    std::unique_ptr<GrFragmentProcessor> clone() const override { return nullptr; }
+
+    GrGLSLFragmentProcessor* onCreateGLSLInstance() const override {
+        class Impl : public GrGLSLFragmentProcessor {
+            void emitCode(EmitArgs& args) override {
+                SkString sample = this->invokeChildWithMatrix(0, args);
+                args.fFragBuilder->codeAppendf("%s = %s;\n", args.fOutputColor, sample.c_str());
+            }
+        };
+        return new Impl;
+    }
+};
+
+// Samples child with a uniform matrix (functionally identical to GrMatrixEffect)
+// Scales along Y
+class UniformMatrixEffect : public GrFragmentProcessor {
+public:
+    static constexpr GrProcessor::ClassID CLASS_ID = (GrProcessor::ClassID) 4;
+
+    UniformMatrixEffect(std::unique_ptr<GrFragmentProcessor> child)
+            : GrFragmentProcessor(CLASS_ID, kNone_OptimizationFlags) {
+        this->registerChild(std::move(child), SkSL::SampleMatrix::MakeConstUniform("matrix"));
+    }
+
+    const char* name() const override { return "UniformMatrixEffect"; }
+    void onGetGLSLProcessorKey(const GrShaderCaps&, GrProcessorKeyBuilder*) const override {}
+    bool onIsEqual(const GrFragmentProcessor& that) const override { return this == &that; }
+    std::unique_ptr<GrFragmentProcessor> clone() const override { return nullptr; }
+
+    GrGLSLFragmentProcessor* onCreateGLSLInstance() const override {
+        class Impl : public GrGLSLFragmentProcessor {
+            void emitCode(EmitArgs& args) override {
+                fMatrixVar = args.fUniformHandler->addUniform(&args.fFp, kFragment_GrShaderFlag,
+                                                              kFloat3x3_GrSLType, "matrix");
+                SkString sample = this->invokeChildWithMatrix(0, args);
+                args.fFragBuilder->codeAppendf("%s = %s;\n", args.fOutputColor, sample.c_str());
+            }
+            void onSetData(const GrGLSLProgramDataManager& pdman,
+                           const GrFragmentProcessor& proc) override {
+                pdman.setSkMatrix(fMatrixVar, SkMatrix::Scale(1, 0.5f));
+            }
+            UniformHandle fMatrixVar;
+        };
+        return new Impl;
+    }
+};
+
+// Samples child with a variable matrix
+// Translates along X
+// Typically, kVariable would be due to multiple sample(matrix) invocations, but this artificially
+// uses kVariable with a single (constant) matrix.
+class VariableMatrixEffect : public GrFragmentProcessor {
+public:
+    static constexpr GrProcessor::ClassID CLASS_ID = (GrProcessor::ClassID) 5;
+
+    VariableMatrixEffect(std::unique_ptr<GrFragmentProcessor> child)
+            : GrFragmentProcessor(CLASS_ID, kNone_OptimizationFlags) {
+        this->registerChild(std::move(child), SkSL::SampleMatrix::MakeVariable());
+    }
+
+    const char* name() const override { return "VariableMatrixEffect"; }
+    void onGetGLSLProcessorKey(const GrShaderCaps&, GrProcessorKeyBuilder*) const override {}
+    bool onIsEqual(const GrFragmentProcessor& that) const override { return this == &that; }
+    std::unique_ptr<GrFragmentProcessor> clone() const override { return nullptr; }
+
+    GrGLSLFragmentProcessor* onCreateGLSLInstance() const override {
+        class Impl : public GrGLSLFragmentProcessor {
+            void emitCode(EmitArgs& args) override {
+                SkString sample = this->invokeChildWithMatrix(
+                        0, args, "float3x3(1, 0, 0, 0, 1, 0, 8, 0, 1)");
+                args.fFragBuilder->codeAppendf("%s = %s;\n", args.fOutputColor, sample.c_str());
+            }
+        };
+        return new Impl;
+    }
+};
+
+// Samples child with explicit coords
+// Translates along Y
+class ExplicitCoordEffect : public GrFragmentProcessor {
+public:
+    static constexpr GrProcessor::ClassID CLASS_ID = (GrProcessor::ClassID) 6;
+
+    ExplicitCoordEffect(std::unique_ptr<GrFragmentProcessor> child)
+            : GrFragmentProcessor(CLASS_ID, kNone_OptimizationFlags) {
+        this->registerExplicitlySampledChild(std::move(child));
+        this->setUsesSampleCoordsDirectly();
+    }
+
+    const char* name() const override { return "ExplicitCoordEffect"; }
+    void onGetGLSLProcessorKey(const GrShaderCaps&, GrProcessorKeyBuilder*) const override {}
+    bool onIsEqual(const GrFragmentProcessor& that) const override { return this == &that; }
+    std::unique_ptr<GrFragmentProcessor> clone() const override { return nullptr; }
+
+    GrGLSLFragmentProcessor* onCreateGLSLInstance() const override {
+        class Impl : public GrGLSLFragmentProcessor {
+            void emitCode(EmitArgs& args) override {
+                args.fFragBuilder->codeAppendf("float2 coord = %s + float2(0, 8);",
+                                               args.fSampleCoord);
+                SkString sample = this->invokeChild(0, args, "coord");
+                args.fFragBuilder->codeAppendf("%s = %s;\n", args.fOutputColor, sample.c_str());
+            }
+        };
+        return new Impl;
+    }
+};
+
+// Generates test pattern
+class TestPatternEffect : public GrFragmentProcessor {
+public:
+    static constexpr GrProcessor::ClassID CLASS_ID = (GrProcessor::ClassID) 7;
+
+    TestPatternEffect() : GrFragmentProcessor(CLASS_ID, kNone_OptimizationFlags) {
+        this->addCoordTransform(&fCoordTransform);
+    }
+
+    const char* name() const override { return "TestPatternEffect"; }
+    void onGetGLSLProcessorKey(const GrShaderCaps&, GrProcessorKeyBuilder*) const override {}
+    bool onIsEqual(const GrFragmentProcessor& that) const override { return this == &that; }
+    std::unique_ptr<GrFragmentProcessor> clone() const override { return nullptr; }
+
+    GrGLSLFragmentProcessor* onCreateGLSLInstance() const override {
+        class Impl : public GrGLSLFragmentProcessor {
+            void emitCode(EmitArgs& args) override {
+                auto fb = args.fFragBuilder;
+                fb->codeAppendf("float2 coord = %s / 64.0;", args.fSampleCoord);
+                fb->codeAppendf("coord = floor(coord * 4) / 3;");
+                fb->codeAppendf("%s = half4(half2(coord.rg), 0, 1);\n", args.fOutputColor);
+            }
+        };
+        return new Impl;
+    }
+    // Placeholder identity coord transform to allow access to local coords
+    GrCoordTransform fCoordTransform = {};
+};
+
+SkBitmap make_test_bitmap() {
+    SkBitmap bitmap;
+    bitmap.allocN32Pixels(64, 64);
+    SkCanvas canvas(bitmap);
+
+    SkFont font(ToolUtils::create_portable_typeface());
+    const char* alpha = "ABCDEFGHIJKLMNOP";
+
+    for (int i = 0; i < 16; ++i) {
+        int tx = i % 4,
+            ty = i / 4;
+        int x = tx * 16,
+            y = ty * 16;
+        SkPaint paint;
+        paint.setColor4f({ tx / 3.0f, ty / 3.0f, 0.0f, 1.0f });
+        canvas.drawRect(SkRect::MakeXYWH(x, y, 16, 16), paint);
+        paint.setColor4f({ (3-tx) / 3.0f, (3-ty)/3.0f, 1.0f, 1.0f });
+        canvas.drawSimpleText(alpha + i, 1, SkTextEncoding::kUTF8, x + 3, y + 13, font, paint);
+    }
+
+    return bitmap;
+}
+
+enum EffectType {
+    kConstant,
+    kUniform,
+    kVariable,
+    kExplicit,
+};
+
+static std::unique_ptr<GrFragmentProcessor> wrap(std::unique_ptr<GrFragmentProcessor> fp,
+                                                 EffectType effectType) {
+    switch (effectType) {
+        case kConstant:
+            return std::unique_ptr<GrFragmentProcessor>(new ConstantMatrixEffect(std::move(fp)));
+        case kUniform:
+            return std::unique_ptr<GrFragmentProcessor>(new UniformMatrixEffect(std::move(fp)));
+        case kVariable:
+            return std::unique_ptr<GrFragmentProcessor>(new VariableMatrixEffect(std::move(fp)));
+        case kExplicit:
+            return std::unique_ptr<GrFragmentProcessor>(new ExplicitCoordEffect(std::move(fp)));
+    }
+    SkUNREACHABLE;
+}
+
+DEF_SIMPLE_GPU_GM(fp_sample_chaining, ctx, rtCtx, canvas, 380, 306) {
+    SkBitmap bmp = make_test_bitmap();
+
+    GrBitmapTextureMaker maker(ctx, bmp, GrImageTexGenPolicy::kDraw);
+    int x = 10, y = 10;
+
+    auto nextCol = [&] { x += (64 + 10); };
+    auto nextRow = [&] { x = 10; y += (64 + 10); };
+
+    auto draw = [&](std::initializer_list<EffectType> effects) {
+        // Enable TestPatternEffect to get a fully procedural inner effect. It's not quite as nice
+        // visually (no text labels in each box), but it avoids the extra GrMatrixEffect.
+        // Switching it on actually triggers *more* shader compilation failures.
+#if 0
+        auto fp = std::unique_ptr<GrFragmentProcessor>(new TestPatternEffect());
+#else
+        auto view = maker.view(GrMipMapped::kNo);
+        auto fp = GrTextureEffect::Make(std::move(view), maker.alphaType());
+#endif
+        for (EffectType effectType : effects) {
+            fp = wrap(std::move(fp), effectType);
+        }
+        GrPaint paint;
+        paint.addColorFragmentProcessor(std::move(fp));
+        rtCtx->drawRect(nullptr, std::move(paint), GrAA::kNo, SkMatrix::Translate(x, y),
+                        SkRect::MakeIWH(64, 64));
+        nextCol();
+    };
+
+    // Reminder, in every case, the chain is more complicated than it seems, because the
+    // GrTextureEffect is wrapped in a GrMatrixEffect, which is subject to the same bugs that
+    // we're testing (particularly the bug about owner/base in UniformMatrixEffect).
+
+    // First row: no transform, then each one independently applied
+    draw({});             // Identity (4 rows and columns)
+    draw({ kConstant });  // Scale X axis by 2x (2 visible columns)
+    draw({ kUniform  });  // Scale Y axis by 2x (2 visible rows)
+    draw({ kVariable });  // Translate left by 8px
+    draw({ kExplicit });  // Translate up by 8px
+    nextRow();
+
+    // Second row: transform duplicated
+    draw({ kConstant, kUniform  });  // Scale XY by 2x (2 rows and columns)
+    draw({ kConstant, kConstant });  // Scale X axis by 4x (1 visible column)
+    draw({ kUniform,  kUniform  });  // Scale Y axis by 4x (1 visible row)
+    draw({ kVariable, kVariable });  // Translate left by 16px
+    draw({ kExplicit, kExplicit });  // Translate up by 16px
+    nextRow();
+
+    // Remember, these are applied inside out:
+    draw({ kConstant, kExplicit }); // Scale X by 2x and translate up by 8px
+    draw({ kConstant, kVariable }); // Scale X by 2x and translate left by 8px
+    draw({ kUniform,  kVariable }); // Scale Y by 2x and translate left by 8px
+    draw({ kUniform,  kExplicit }); // Scale Y by 2x and translate up by 8px
+    draw({ kVariable, kExplicit }); // Translate left and up by 8px
+    nextRow();
+
+    draw({ kExplicit, kExplicit, kConstant }); // Scale X by 2x and translate up by 16px
+    draw({ kVariable, kConstant }); // Scale X by 2x and translate left by 16px
+    draw({ kVariable, kVariable, kUniform }); // Scale Y by 2x and translate left by 16px
+    draw({ kExplicit, kUniform }); // Scale Y by 2x and translate up by 16px
+    draw({ kExplicit, kUniform, kVariable, kConstant }); // Scale XY by 2x and translate xy 16px
+}
diff --git a/gm/sample_matrix_constant.cpp b/gm/sample_matrix_constant.cpp
index 45e26e6..4ea7ed9 100644
--- a/gm/sample_matrix_constant.cpp
+++ b/gm/sample_matrix_constant.cpp
@@ -46,7 +46,7 @@
 class GLSLSampleMatrixConstantEffect : public GrGLSLFragmentProcessor {
     void emitCode(EmitArgs& args) override {
         GrGLSLFPFragmentBuilder* fragBuilder = args.fFragBuilder;
-        SkString sample = this->invokeChild(0, args);
+        SkString sample = this->invokeChildWithMatrix(0, args);
         fragBuilder->codeAppendf("%s = %s;\n", args.fOutputColor, sample.c_str());
     }
 };
diff --git a/gm/sample_matrix_variable.cpp b/gm/sample_matrix_variable.cpp
index 185f654..1eb77dc 100644
--- a/gm/sample_matrix_variable.cpp
+++ b/gm/sample_matrix_variable.cpp
@@ -100,6 +100,6 @@
         GrColorInfo colorInfo;
         GrFPArgs args(ctx, matrixProvider, kHigh_SkFilterQuality, &colorInfo);
         std::unique_ptr<GrFragmentProcessor> gradientFP = as_SB(shader)->asFragmentProcessor(args);
-        draw(std::move(gradientFP), -0.5f, 1.0f, 256, 0);
+        draw(std::move(gradientFP), -128, 256, 256, 0);
     }
 }
diff --git a/gn/gm.gni b/gn/gm.gni
index 4ae72fb..f358ebe 100644
--- a/gn/gm.gni
+++ b/gn/gm.gni
@@ -175,6 +175,7 @@
   "$_gm/fontregen.cpp",
   "$_gm/fontscaler.cpp",
   "$_gm/fontscalerdistortable.cpp",
+  "$_gm/fp_sample_chaining.cpp",
   "$_gm/fpcoordinateoverride.cpp",
   "$_gm/fwidth_squircle.cpp",
   "$_gm/gammatext.cpp",
diff --git a/include/effects/SkRuntimeEffect.h b/include/effects/SkRuntimeEffect.h
index d570de6..fe14398 100644
--- a/include/effects/SkRuntimeEffect.h
+++ b/include/effects/SkRuntimeEffect.h
@@ -129,13 +129,15 @@
     // Returns index of the named child, or -1 if not found
     int findChild(const char* name) const;
 
+    bool usesSampleCoords() const { return fMainFunctionHasSampleCoords; }
+
     static void RegisterFlattenables();
     ~SkRuntimeEffect();
 
 private:
     SkRuntimeEffect(SkString sksl, std::unique_ptr<SkSL::Program> baseProgram,
                     std::vector<Variable>&& inAndUniformVars, std::vector<SkString>&& children,
-                    std::vector<Varying>&& varyings, size_t uniformSize);
+                    std::vector<Varying>&& varyings, size_t uniformSize, bool mainHasLocalCoords);
 
     using SpecializeResult = std::tuple<std::unique_ptr<SkSL::Program>, SkString>;
     SpecializeResult specialize(SkSL::Program& baseProgram, const void* inputs,
@@ -172,6 +174,7 @@
     std::vector<Varying>  fVaryings;
 
     size_t fUniformSize;
+    bool   fMainFunctionHasSampleCoords;
 };
 
 /**
diff --git a/src/core/SkRuntimeEffect.cpp b/src/core/SkRuntimeEffect.cpp
index fb8d40e..af19a1a 100644
--- a/src/core/SkRuntimeEffect.cpp
+++ b/src/core/SkRuntimeEffect.cpp
@@ -87,6 +87,22 @@
     }
     SkASSERT(!compiler->errorCount());
 
+    // FIXME can the SkSL::Program just provide this for us?
+    bool mainHasSampleCoords = false;
+    for (const auto& e : *program) {
+        if (e.fKind == SkSL::ProgramElement::kFunction_Kind) {
+            const SkSL::FunctionDefinition& func = (const SkSL::FunctionDefinition&) e;
+            if (func.fDeclaration.fName == "main") {
+                SkASSERT(func.fDeclaration.fParameters.size() <= 2);
+                if (!func.fDeclaration.fParameters.empty() &&
+                    func.fDeclaration.fParameters.front()->fType.fName == "float2") {
+                    mainHasSampleCoords = true;
+                    break;
+                }
+            }
+        }
+    }
+
     size_t offset = 0, uniformSize = 0;
     std::vector<Variable> inAndUniformVars;
     std::vector<SkString> children;
@@ -259,7 +275,8 @@
                                                       std::move(inAndUniformVars),
                                                       std::move(children),
                                                       std::move(varyings),
-                                                      uniformSize));
+                                                      uniformSize,
+                                                      mainHasSampleCoords));
     return std::make_pair(std::move(effect), SkString());
 }
 
@@ -287,14 +304,16 @@
                                  std::vector<Variable>&& inAndUniformVars,
                                  std::vector<SkString>&& children,
                                  std::vector<Varying>&& varyings,
-                                 size_t uniformSize)
+                                 size_t uniformSize,
+                                 bool mainHasSampleCoords)
         : fHash(SkGoodHash()(sksl))
         , fSkSL(std::move(sksl))
         , fBaseProgram(std::move(baseProgram))
         , fInAndUniformVars(std::move(inAndUniformVars))
         , fChildren(std::move(children))
         , fVaryings(std::move(varyings))
-        , fUniformSize(uniformSize) {
+        , fUniformSize(uniformSize)
+        , fMainFunctionHasSampleCoords(mainHasSampleCoords) {
     SkASSERT(fBaseProgram);
     SkASSERT(SkIsAlign4(fUniformSize));
     SkASSERT(fUniformSize <= this->inputSize());
diff --git a/src/effects/imagefilters/SkDisplacementMapEffect.cpp b/src/effects/imagefilters/SkDisplacementMapEffect.cpp
index fe5fe87..ca28144 100644
--- a/src/effects/imagefilters/SkDisplacementMapEffect.cpp
+++ b/src/effects/imagefilters/SkDisplacementMapEffect.cpp
@@ -590,8 +590,6 @@
     // Unpremultiply the displacement
     fragBuilder->codeAppendf("%s.rgb = (%s.a < %s) ? half3(0.0) : saturate(%s.rgb / %s.a);",
                              dColor, dColor, nearZero, dColor, dColor);
-    SkString coords2D = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                    args.fFp.sampleMatrix());
     auto chanChar = [](SkColorChannel c) {
         switch(c) {
             case SkColorChannel::kR: return 'r';
@@ -602,7 +600,7 @@
         }
     };
     fragBuilder->codeAppendf("float2 %s = %s + %s*(%s.%c%c - half2(0.5));",
-                             cCoords, coords2D.c_str(), scaleUni, dColor,
+                             cCoords, args.fSampleCoord, scaleUni, dColor,
                              chanChar(displacementMap.xChannelSelector()),
                              chanChar(displacementMap.yChannelSelector()));
 
diff --git a/src/effects/imagefilters/SkLightingImageFilter.cpp b/src/effects/imagefilters/SkLightingImageFilter.cpp
index 5181f11..122c02a 100644
--- a/src/effects/imagefilters/SkLightingImageFilter.cpp
+++ b/src/effects/imagefilters/SkLightingImageFilter.cpp
@@ -1767,8 +1767,6 @@
         GrShaderVar("scale", kHalf_GrSLType),
     };
     SkString sobelFuncName;
-    SkString coords2D = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                    args.fFp.sampleMatrix());
 
     fragBuilder->emitFunction(kHalf_GrSLType,
                               "sobel",
@@ -1804,7 +1802,7 @@
                               normalBody.c_str(),
                               &normalName);
 
-    fragBuilder->codeAppendf("\t\tfloat2 coord = %s;\n", coords2D.c_str());
+    fragBuilder->codeAppendf("\t\tfloat2 coord = %s;\n", args.fSampleCoord);
     fragBuilder->codeAppend("\t\thalf m[9];\n");
 
     const char* surfScale = uniformHandler->getUniformCStr(fSurfaceScaleUni);
diff --git a/src/effects/imagefilters/SkMorphologyImageFilter.cpp b/src/effects/imagefilters/SkMorphologyImageFilter.cpp
index 35f35d7..1131fb0 100644
--- a/src/effects/imagefilters/SkMorphologyImageFilter.cpp
+++ b/src/effects/imagefilters/SkMorphologyImageFilter.cpp
@@ -270,8 +270,6 @@
             const char* range = uniformHandler->getUniformCStr(fRangeUni);
 
             GrGLSLFPFragmentBuilder* fragBuilder = args.fFragBuilder;
-            SkString coords2D = fragBuilder->ensureCoords2D(
-                    args.fTransformedCoords[0].fVaryingPoint, args.fFp.sampleMatrix());
 
             const char* func = me.fType == MorphType::kErode ? "min" : "max";
 
@@ -283,7 +281,7 @@
             int width = 2 * me.fRadius + 1;
 
             // float2 coord = coord2D;
-            fragBuilder->codeAppendf("float2 coord = %s;", coords2D.c_str());
+            fragBuilder->codeAppendf("float2 coord = %s;", args.fSampleCoord);
             // coord.x -= radius;
             fragBuilder->codeAppendf("coord.%c -= %d;", dir, me.fRadius);
             if (me.fUseRange) {
diff --git a/src/gpu/GrFragmentProcessor.cpp b/src/gpu/GrFragmentProcessor.cpp
index 9c668d3..22e511e 100644
--- a/src/gpu/GrFragmentProcessor.cpp
+++ b/src/gpu/GrFragmentProcessor.cpp
@@ -68,53 +68,67 @@
     return this->onTextureSampler(i);
 }
 
+int GrFragmentProcessor::numCoordTransforms() const {
+    if (SkToBool(fFlags & kUsesSampleCoordsDirectly_Flag) && fCoordTransforms.empty() &&
+        !this->isSampledWithExplicitCoords()) {
+        // coordTransform(0) will return an implicitly defined coord transform so that varyings are
+        // added for this FP in order to support const/uniform sample matrix lifting.
+        return 1;
+    } else {
+        return fCoordTransforms.count();
+    }
+}
+
+const GrCoordTransform& GrFragmentProcessor::coordTransform(int i) const {
+    SkASSERT(i >= 0 && i < this->numCoordTransforms());
+    if (SkToBool(fFlags & kUsesSampleCoordsDirectly_Flag) && fCoordTransforms.empty() &&
+        !this->isSampledWithExplicitCoords()) {
+        SkASSERT(i == 0);
+
+        // as things stand, matrices only work when there's a coord transform, so we need to add
+        // an identity transform to keep the downstream code happy
+        static const GrCoordTransform kImplicitIdentity;
+        return kImplicitIdentity;
+    } else {
+        return *fCoordTransforms[i];
+    }
+}
+
 void GrFragmentProcessor::addCoordTransform(GrCoordTransform* transform) {
     fCoordTransforms.push_back(transform);
     fFlags |= kHasCoordTransforms_Flag;
 }
 
 void GrFragmentProcessor::setSampleMatrix(SkSL::SampleMatrix newMatrix) {
-    if (newMatrix == fMatrix) {
-        return;
+    SkASSERT(!newMatrix.isNoOp());
+    SkASSERT(fMatrix.isNoOp());
+
+    fMatrix = newMatrix;
+    // When an FP is sampled using variable matrix expressions, it is effectively being sampled
+    // explicitly, except that the call site will automatically evaluate the matrix expression to
+    // produce the float2 passed into this FP.
+    if (fMatrix.isVariable()) {
+        this->addAndPushFlagToChildren(kSampledWithExplicitCoords_Flag);
     }
-    SkASSERT(newMatrix.fKind != SkSL::SampleMatrix::Kind::kNone);
-    SkASSERT(fMatrix.fKind != SkSL::SampleMatrix::Kind::kVariable);
-    if (this->numCoordTransforms() == 0 &&
-        (newMatrix.fKind == SkSL::SampleMatrix::Kind::kConstantOrUniform ||
-         newMatrix.fKind == SkSL::SampleMatrix::Kind::kMixed)) {
-        // as things stand, matrices only work when there's a coord transform, so we need to add
-        // an identity transform to keep the downstream code happy
-        static GrCoordTransform identity;
-        this->addCoordTransform(&identity);
-    }
-    if (fMatrix.fKind == SkSL::SampleMatrix::Kind::kConstantOrUniform) {
-        if (newMatrix.fKind == SkSL::SampleMatrix::Kind::kConstantOrUniform) {
-            // need to base this transform on the one that happened in our parent
-            // If we're already based on something, then we have to assume that parent is now
-            // based on yet another transform, so don't update our base pointer (or we'll skip
-            // the intermediate transform).
-            if (!fMatrix.fBase) {
-                fMatrix.fBase = newMatrix.fOwner;
-            }
-        } else {
-            SkASSERT(newMatrix.fKind == SkSL::SampleMatrix::Kind::kVariable);
-            fMatrix.fKind = SkSL::SampleMatrix::Kind::kMixed;
-            fMatrix.fBase = nullptr;
-        }
-    } else {
-        SkASSERT(fMatrix.fKind == SkSL::SampleMatrix::Kind::kNone);
-        fMatrix = newMatrix;
-    }
-    for (auto& child : fChildProcessors) {
-        child->setSampleMatrix(newMatrix);
+    // Push perspective matrix type to children
+    if (fMatrix.fHasPerspective) {
+        this->addAndPushFlagToChildren(kNetTransformHasPerspective_Flag);
     }
 }
 
-void GrFragmentProcessor::setSampledWithExplicitCoords() {
-    fFlags |= kSampledWithExplicitCoords;
-    for (auto& child : fChildProcessors) {
-        child->setSampledWithExplicitCoords();
+void GrFragmentProcessor::addAndPushFlagToChildren(PrivateFlags flag) {
+    // This propagates down, so if we've already marked it, all our children should have it too
+    if (!(fFlags & flag)) {
+        fFlags |= flag;
+        for (auto& child : fChildProcessors) {
+            child->addAndPushFlagToChildren(flag);
+        }
     }
+#ifdef SK_DEBUG
+    for (auto& child : fChildProcessors) {
+        SkASSERT(child->fFlags & flag);
+    }
+#endif
 }
 
 #ifdef SK_DEBUG
@@ -138,32 +152,52 @@
 int GrFragmentProcessor::registerChild(std::unique_ptr<GrFragmentProcessor> child,
                                        SkSL::SampleMatrix sampleMatrix,
                                        bool explicitlySampled) {
+    // The child should not have been attached to another FP already and not had any sampling
+    // strategy set on it.
+    SkASSERT(child && !child->fParent && child->sampleMatrix().isNoOp() &&
+             !child->isSampledWithExplicitCoords() && !child->hasPerspectiveTransform());
+
     // Configure child's sampling state first
     if (explicitlySampled) {
-        child->setSampledWithExplicitCoords();
+        child->addAndPushFlagToChildren(kSampledWithExplicitCoords_Flag);
     }
     if (sampleMatrix.fKind != SkSL::SampleMatrix::Kind::kNone) {
-        // FIXME(michaelludwig) - Temporary hack. Owner tracking will be moved off of SampleMatrix
-        // and into FP. Currently, coord transform compilation fails on sample_matrix GMs if the
-        // child isn't the owner. But the matrix effect (and expected behavior) require the owner
-        // to be 'this' FP.
-        if (this->classID() == kGrMatrixEffect_ClassID) {
-            sampleMatrix.fOwner = this;
-        } else {
-            sampleMatrix.fOwner = child.get();
-        }
         child->setSampleMatrix(sampleMatrix);
     }
 
     if (child->fFlags & kHasCoordTransforms_Flag) {
         fFlags |= kHasCoordTransforms_Flag;
     }
+
+    if (child->sampleMatrix().fKind == SkSL::SampleMatrix::Kind::kVariable) {
+        // Since the child is sampled with a variable matrix expression, auto-generated code in
+        // invokeChildWithMatrix() for this FP will refer to the local coordinates.
+        this->setUsesSampleCoordsDirectly();
+    }
+
+    // If the child is not sampled explicitly and not already accessing sample coords directly
+    // (through reference or variable matrix expansion), then mark that this FP tree relies on
+    // coordinates at a lower level. If the child is sampled with explicit coordinates and
+    // there isn't any other direct reference to the sample coords, we halt the upwards propagation
+    // because it means this FP is determining coordinates on its own.
+    if (!child->isSampledWithExplicitCoords()) {
+        if ((child->fFlags & kUsesSampleCoordsDirectly_Flag ||
+             child->fFlags & kUsesSampleCoordsIndirectly_Flag)) {
+            fFlags |= kUsesSampleCoordsIndirectly_Flag;
+        }
+    }
+
     fRequestedFeatures |= child->fRequestedFeatures;
 
     int index = fChildProcessors.count();
+    // Record that the child is attached to us; this FP is the source of any uniform data needed
+    // to evaluate the child sample matrix.
+    child->fParent = this;
     fChildProcessors.push_back(std::move(child));
-    SkASSERT(fMatrix.fKind == SkSL::SampleMatrix::Kind::kNone ||
-             fMatrix.fKind == SkSL::SampleMatrix::Kind::kConstantOrUniform);
+
+    // Sanity check: our sample strategy comes from a parent we shouldn't have yet.
+    SkASSERT(!this->isSampledWithExplicitCoords() && !this->hasPerspectiveTransform() &&
+             fMatrix.isNoOp() && !fParent);
     return index;
 }
 
diff --git a/src/gpu/GrFragmentProcessor.h b/src/gpu/GrFragmentProcessor.h
index a0f540e..9c84276 100644
--- a/src/gpu/GrFragmentProcessor.h
+++ b/src/gpu/GrFragmentProcessor.h
@@ -106,6 +106,9 @@
      */
     virtual std::unique_ptr<GrFragmentProcessor> clone() const = 0;
 
+    // The FP this was registered with as a child function. This will be null if this is a root.
+    const GrFragmentProcessor* parent() const { return fParent; }
+
     GrGLSLFragmentProcessor* createGLSLInstance() const;
 
     void getGLSLProcessorKey(const GrShaderCaps& caps, GrProcessorKeyBuilder* b) const {
@@ -118,16 +121,8 @@
     int numTextureSamplers() const { return fTextureSamplerCnt; }
     const TextureSampler& textureSampler(int i) const;
 
-    int numCoordTransforms() const { return fCoordTransforms.count(); }
-
-    /** Returns the coordinate transformation at index. index must be valid according to
-        numCoordTransforms(). */
-    const GrCoordTransform& coordTransform(int index) const { return *fCoordTransforms[index]; }
-    GrCoordTransform& coordTransform(int index) { return *fCoordTransforms[index]; }
-
-    const SkTArray<GrCoordTransform*, true>& coordTransforms() const {
-        return fCoordTransforms;
-    }
+    int numCoordTransforms() const;
+    const GrCoordTransform& coordTransform(int index) const;
 
     int numChildProcessors() const { return fChildProcessors.count(); }
 
@@ -136,19 +131,52 @@
 
     SkDEBUGCODE(bool isInstantiated() const;)
 
-    /** Do any of the coord transforms for this processor require local coords? */
-    bool usesLocalCoords() const {
-        // If the processor is sampled with explicit coords then we do not need to apply the
-        // coord transforms in the vertex shader to the local coords.
-        return SkToBool(fFlags & kHasCoordTransforms_Flag) &&
-               !SkToBool(fFlags & kSampledWithExplicitCoords);
+    /**
+     * Do any of the FPs in this tree require local coordinates to be produced by the primitive
+     * processor. This can return true even if this FP does not refer to sample coordinates, but
+     * true if a descendant FP uses them.  FPs that are sampled explicitly do not
+     * require primitive-generated local coordinates.
+     *
+     * If the root of an FP tree does not provide explicit coordinates, the geometry processor
+     * provides the original local coordinates to start. This may be implicit as part of vertex
+     * shader-lifted varyings, or by providing the base local coordinate to the fragment shader.
+     */
+    bool sampleCoordsDependOnLocalCoords() const {
+        return (SkToBool(fFlags & kHasCoordTransforms_Flag) ||
+                SkToBool(fFlags & kUsesSampleCoordsDirectly_Flag) ||
+                SkToBool(fFlags & kUsesSampleCoordsIndirectly_Flag)) &&
+               !SkToBool(fFlags & kSampledWithExplicitCoords_Flag);
     }
 
+   /**
+     * True if this FP refers directly to the sample coordinate parameter of its function
+     * (e.g. uses EmitArgs::fSampleCoord in emitCode()). This also returns true if the
+     * coordinate reference comes from autogenerated code invoking 'sample(matrix)' expressions.
+     *
+     * Unlike sampleCoordsDependOnLocalCoords(), this can return true whether or not the FP is
+     * explicitly sampled, and does not change based on how the FP is composed. This property is
+     * specific to the FP's function and not the entire program.
+     */
+    bool referencesSampleCoords() const {
+        // HasCoordTransforms propagates up the FP tree, but we want the presence of an actual
+        // coord transform object (that's not one of the implicit workarounds).
+        return SkToBool(fFlags & kUsesSampleCoordsDirectly_Flag) || fCoordTransforms.count() > 0;
+    }
+
+    // True if this FP's parent invokes it with 'sample(float2)' or a variable 'sample(matrix)'
     bool isSampledWithExplicitCoords() const {
-        return SkToBool(fFlags & kSampledWithExplicitCoords);
+        return SkToBool(fFlags & kSampledWithExplicitCoords_Flag);
     }
 
-    SkSL::SampleMatrix sampleMatrix() const {
+    // True if the transform chain from root to this FP introduces perspective into the local
+    // coordinate expression.
+    bool hasPerspectiveTransform() const {
+        return SkToBool(fFlags & kNetTransformHasPerspective_Flag);
+    }
+
+    // The SampleMatrix describing how this FP is invoked by its parent using 'sample(matrix)'
+    // This only reflects the immediate sampling from parent to this FP
+    const SkSL::SampleMatrix& sampleMatrix() const {
         return fMatrix;
     }
 
@@ -297,25 +325,11 @@
     using FPTextureSamplerRange = FPItemRange<const GrFragmentProcessor, TextureSamplerIter>;
     using ProcessorSetTextureSamplerRange = FPItemRange<const GrProcessorSet, TextureSamplerIter>;
 
-    // Not used directly.
-    using NonConstCoordTransformIter =
-            FPItemIter<GrCoordTransform, &GrFragmentProcessor::numCoordTransforms,
-                       &GrFragmentProcessor::coordTransform>;
-    // Iterator over non-const GrCoordTransforms owned by FP and its descendants.
-    using FPCoordTransformRange = FPItemRange<GrFragmentProcessor, NonConstCoordTransformIter>;
-
     // Sentinel type for range-for using Iter.
     class EndIter {};
     // Sentinel type for range-for using FPItemIter.
     class FPItemEndIter {};
 
-    // FIXME This should be private, but SkGr needs to mark the dither effect as sampled explicitly
-    // even though it's not added to another FP. Once varying generation doesn't add a redundant
-    // varying for it, this can be fully private.
-    void temporary_SetExplicitlySampled() {
-        this->setSampledWithExplicitCoords();
-    }
-
 protected:
     enum OptimizationFlags : uint32_t {
         kNone_OptimizationFlags,
@@ -445,6 +459,12 @@
         fTextureSamplerCnt = cnt;
     }
 
+    // FP implementations must call this function if their matching GrGLSLFragmentProcessor's
+    // emitCode() function uses the EmitArgs::fSampleCoord variable in generated SkSL.
+    void setUsesSampleCoordsDirectly() {
+        fFlags |= kUsesSampleCoordsDirectly_Flag;
+    }
+
     /**
      * Helper for implementing onTextureSampler(). E.g.:
      * return IthTexureSampler(i, fMyFirstSampler, fMySecondSampler, fMyThirdSampler);
@@ -484,15 +504,23 @@
 
     bool hasSameTransforms(const GrFragmentProcessor&) const;
 
-    void setSampledWithExplicitCoords();
-
     void setSampleMatrix(SkSL::SampleMatrix matrix);
 
     enum PrivateFlags {
         kFirstPrivateFlag = kAll_OptimizationFlags + 1,
+
+        // Propagate up the FP tree to the root
         kHasCoordTransforms_Flag = kFirstPrivateFlag,
-        kSampledWithExplicitCoords = kFirstPrivateFlag << 1,
+        kUsesSampleCoordsIndirectly_Flag = kFirstPrivateFlag << 1,
+
+        // Does not propagate at all
+        kUsesSampleCoordsDirectly_Flag = kFirstPrivateFlag << 2,
+
+        // Propagates down the FP to all its leaves
+        kSampledWithExplicitCoords_Flag = kFirstPrivateFlag << 3,
+        kNetTransformHasPerspective_Flag = kFirstPrivateFlag << 4,
     };
+    void addAndPushFlagToChildren(PrivateFlags flag);
 
     uint32_t fFlags = 0;
 
@@ -501,7 +529,7 @@
     SkSTArray<4, GrCoordTransform*, true> fCoordTransforms;
 
     SkSTArray<1, std::unique_ptr<GrFragmentProcessor>, true> fChildProcessors;
-
+    const GrFragmentProcessor* fParent = nullptr;
     SkSL::SampleMatrix fMatrix;
 
     typedef GrProcessor INHERITED;
diff --git a/src/gpu/GrPathProcessor.cpp b/src/gpu/GrPathProcessor.cpp
index ec62eb1..54c330c 100644
--- a/src/gpu/GrPathProcessor.cpp
+++ b/src/gpu/GrPathProcessor.cpp
@@ -92,8 +92,8 @@
             } else {
                 SkString strVaryingName;
                 strVaryingName.printf("TransformedCoord_%d", i);
-                GrSLType varyingType = coordTransform.matrix().hasPerspective() ? kHalf3_GrSLType
-                                                                                : kHalf2_GrSLType;
+                GrSLType varyingType = coordTransform.matrix().hasPerspective() ? kFloat3_GrSLType
+                                                                                : kFloat2_GrSLType;
                 GrGLSLVarying v(varyingType);
 #ifdef SK_GL
                 GrGLVaryingHandler* glVaryingHandler = (GrGLVaryingHandler*)varyingHandler;
@@ -143,9 +143,9 @@
                 SkMatrix m = GetTransformMatrix(transform, pathProc.localMatrix());
                 if (!SkMatrixPriv::CheapEqual(fVaryingTransform[v].fCurrentValue, m)) {
                     fVaryingTransform[v].fCurrentValue = m;
-                    SkASSERT(fVaryingTransform[v].fType == kHalf2_GrSLType ||
-                             fVaryingTransform[v].fType == kHalf3_GrSLType);
-                    int components = fVaryingTransform[v].fType == kHalf2_GrSLType ? 2 : 3;
+                    SkASSERT(fVaryingTransform[v].fType == kFloat2_GrSLType ||
+                             fVaryingTransform[v].fType == kFloat3_GrSLType);
+                    int components = fVaryingTransform[v].fType == kFloat2_GrSLType ? 2 : 3;
                     pd.setPathFragmentInputTransform(fVaryingTransform[v].fHandle, components, m);
                 }
                 ++v;
diff --git a/src/gpu/GrPrimitiveProcessor.cpp b/src/gpu/GrPrimitiveProcessor.cpp
index b2c75c0..2be07b1 100644
--- a/src/gpu/GrPrimitiveProcessor.cpp
+++ b/src/gpu/GrPrimitiveProcessor.cpp
@@ -11,15 +11,27 @@
 #include "src/gpu/GrFragmentProcessor.h"
 
 /**
- * We specialize the vertex or fragment coord transform code for these matrix types.
- * Some specializations are only applied when the coord transform is applied in the fragment
- * shader.
+ * We specialize the vertex or fragment coord transform code for these matrix types, and where
+ * the transform code is applied.
  */
-enum MatrixType {
-    kNone_MatrixType            = 0,  // Used only in FS for explicitly sampled FPs
-    kScaleTranslate_MatrixType  = 1,  // Used only in FS for explicitly sampled FPs
-    kNoPersp_MatrixType         = 2,
-    kGeneral_MatrixType         = 3,
+enum SampleFlag {
+    kExplicitlySampled_Flag          = 0b0000001,  // GrFP::isSampledWithExplicitCoords()
+
+    kLegacyCoordTransform_Flag       = 0b0000010, // !GrFP::coordTransform(i)::isNoOp()
+
+    kNone_SampleMatrix_Flag          = 0b0000100, // GrFP::sampleMatrix()::isNoOp()
+    kConstUniform_SampleMatrix_Flag  = 0b0001000, // GrFP::sampleMatrix()::isConstUniform()
+    kVariable_SampleMatrix_Flag      = 0b0001100, // GrFP::sampleMatrix()::isVariable()
+
+    // Legacy coord transforms specialize on identity, S+T, no-perspective, and general matrix types
+    // FIXME these (and kLegacyCoordTransform) can be removed once all FPs no longer use them
+    kLCT_ScaleTranslate_Matrix_Flag  = 0b0010000, // GrFP::coordTransform(i)::isScaleTranslate()
+    kLCT_NoPersp_Matrix_Flag         = 0b0100000, // !GrFP::coordTransform(i)::hasPerspective()
+    kLCT_General_Matrix_Flag         = 0b0110000, // any other matrix type
+
+    // Currently, sample(matrix) only specializes on no-perspective or general.
+    // FIXME add new flags as more matrix types are supported.
+    kPersp_Matrix_Flag               = 0b1000000, // GrFP::sampleMatrix()::fHasPerspective
 };
 
 GrPrimitiveProcessor::GrPrimitiveProcessor(ClassID classID) : GrProcessor(classID) {}
@@ -31,28 +43,47 @@
 
 uint32_t GrPrimitiveProcessor::computeCoordTransformsKey(const GrFragmentProcessor& fp) const {
     // This is highly coupled with the code in GrGLSLGeometryProcessor::emitTransforms().
-    SkASSERT(fp.numCoordTransforms() * 2 <= 32);
-    uint32_t totalKey = 0;
-    for (int t = 0; t < fp.numCoordTransforms(); ++t) {
-        uint32_t key = 0;
-        const GrCoordTransform& coordTransform = fp.coordTransform(t);
-        if (fp.isSampledWithExplicitCoords() && coordTransform.isNoOp()) {
-            key = kNone_MatrixType;
-        } else if (fp.isSampledWithExplicitCoords() && coordTransform.matrix().isScaleTranslate()) {
-            key = kScaleTranslate_MatrixType;
-        } else if (!coordTransform.matrix().hasPerspective()) {
-            key = kNoPersp_MatrixType;
-        } else {
-            // Note that we can also have homogeneous varyings as a result of a GP local matrix or
-            // homogeneous local coords generated by GP. We're relying on the GP to include any
-            // variability in those in its key.
-            key = kGeneral_MatrixType;
-        }
-        key <<= 2*t;
-        SkASSERT(0 == (totalKey & key)); // keys for each transform ought not to overlap
-        totalKey |= key;
+    // At this point, all effects either don't use legacy coord transforms, or only use 1.
+    SkASSERT(fp.numCoordTransforms() <= 1);
+
+    uint32_t key = 0;
+    if (fp.isSampledWithExplicitCoords()) {
+        key |= kExplicitlySampled_Flag;
     }
-    return totalKey;
+    if (fp.numCoordTransforms() > 0) {
+        const GrCoordTransform& coordTransform = fp.coordTransform(0);
+        if (!coordTransform.isNoOp()) {
+            // A true identity matrix shouldn't result in a coord transform; proxy normalization
+            // and flipping will eventually present as a scale+translate matrix.
+            SkASSERT(!coordTransform.matrix().isIdentity() || coordTransform.normalize() ||
+                     coordTransform.reverseY());
+            key |= kLegacyCoordTransform_Flag;
+            if (coordTransform.matrix().isScaleTranslate()) {
+                key |= kLCT_ScaleTranslate_Matrix_Flag;
+            } else if (!coordTransform.matrix().hasPerspective()) {
+                key |= kLCT_NoPersp_Matrix_Flag;
+            } else {
+                key |= kLCT_General_Matrix_Flag;
+            }
+        }
+    }
+
+    switch(fp.sampleMatrix().fKind) {
+        case SkSL::SampleMatrix::Kind::kNone:
+            key |= kNone_SampleMatrix_Flag;
+            break;
+        case SkSL::SampleMatrix::Kind::kConstantOrUniform:
+            key |= kConstUniform_SampleMatrix_Flag;
+            break;
+        case SkSL::SampleMatrix::Kind::kVariable:
+            key |= kVariable_SampleMatrix_Flag;
+            break;
+    }
+    if (fp.sampleMatrix().fHasPerspective) {
+        key |= kPersp_Matrix_Flag;
+    }
+
+    return key;
 }
 
 ///////////////////////////////////////////////////////////////////////////////////////////////////
diff --git a/src/gpu/GrProcessorAnalysis.cpp b/src/gpu/GrProcessorAnalysis.cpp
index 049bc8d..0395236 100644
--- a/src/gpu/GrProcessorAnalysis.cpp
+++ b/src/gpu/GrProcessorAnalysis.cpp
@@ -40,7 +40,7 @@
             if (fCompatibleWithCoverageAsAlpha && !fp->compatibleWithCoverageAsAlpha()) {
                 fCompatibleWithCoverageAsAlpha = false;
             }
-            if (fp->usesLocalCoords()) {
+            if (fp->sampleCoordsDependOnLocalCoords()) {
                 fUsesLocalCoords = true;
             }
         }
diff --git a/src/gpu/GrProcessorSet.cpp b/src/gpu/GrProcessorSet.cpp
index b8a0aef..cc10424 100644
--- a/src/gpu/GrProcessorSet.cpp
+++ b/src/gpu/GrProcessorSet.cpp
@@ -183,14 +183,14 @@
         if (!fps[i]->compatibleWithCoverageAsAlpha()) {
             analysis.fCompatibleWithCoverageAsAlpha = false;
         }
-        coverageUsesLocalCoords |= fps[i]->usesLocalCoords();
+        coverageUsesLocalCoords |= fps[i]->sampleCoordsDependOnLocalCoords();
     }
     if (clip) {
         hasCoverageFP = hasCoverageFP || clip->numClipCoverageFragmentProcessors();
         for (int i = 0; i < clip->numClipCoverageFragmentProcessors(); ++i) {
             const GrFragmentProcessor* clipFP = clip->clipCoverageFragmentProcessor(i);
             analysis.fCompatibleWithCoverageAsAlpha &= clipFP->compatibleWithCoverageAsAlpha();
-            coverageUsesLocalCoords |= clipFP->usesLocalCoords();
+            coverageUsesLocalCoords |= clipFP->sampleCoordsDependOnLocalCoords();
         }
     }
     int colorFPsToEliminate = colorAnalysis.initialProcessorsToEliminate(overrideInputColor);
diff --git a/src/gpu/GrSPIRVUniformHandler.h b/src/gpu/GrSPIRVUniformHandler.h
index 0ba4ff9..fe4137c 100644
--- a/src/gpu/GrSPIRVUniformHandler.h
+++ b/src/gpu/GrSPIRVUniformHandler.h
@@ -42,6 +42,9 @@
     UniformInfo& uniform(int idx) override {
         return fUniforms.item(idx);
     }
+    const UniformInfo& uniform(int idx) const override {
+        return fUniforms.item(idx);
+    }
 
 private:
     explicit GrSPIRVUniformHandler(GrGLSLProgramBuilder* program);
diff --git a/src/gpu/SkGr.cpp b/src/gpu/SkGr.cpp
index 9b77a28..5554d5a 100644
--- a/src/gpu/SkGr.cpp
+++ b/src/gpu/SkGr.cpp
@@ -382,9 +382,6 @@
             auto ditherFP = GrSkSLFP::Make(context, effect, "Dither",
                                            SkData::MakeWithCopy(&ditherRange, sizeof(ditherRange)));
             if (ditherFP) {
-                // The dither shader doesn't actually use input coordinates, but if we don't set
-                // this flag, the generated shader includes an extra local coord varying.
-                ditherFP->temporary_SetExplicitlySampled();
                 grPaint->addColorFragmentProcessor(std::move(ditherFP));
             }
         }
diff --git a/src/gpu/ccpr/GrCCDrawPathsOp.cpp b/src/gpu/ccpr/GrCCDrawPathsOp.cpp
index 3b27f15..539a6f2 100644
--- a/src/gpu/ccpr/GrCCDrawPathsOp.cpp
+++ b/src/gpu/ccpr/GrCCDrawPathsOp.cpp
@@ -18,7 +18,7 @@
 
 static bool has_coord_transforms(const GrPaint& paint) {
     for (const auto& fp : GrFragmentProcessor::PaintCRange(paint)) {
-        if (!fp.coordTransforms().empty()) {
+        if (fp.numCoordTransforms() > 0) {
             return true;
         }
     }
diff --git a/src/gpu/effects/GrBicubicEffect.cpp b/src/gpu/effects/GrBicubicEffect.cpp
index d575668..204d3d4 100644
--- a/src/gpu/effects/GrBicubicEffect.cpp
+++ b/src/gpu/effects/GrBicubicEffect.cpp
@@ -26,8 +26,6 @@
     const GrBicubicEffect& bicubicEffect = args.fFp.cast<GrBicubicEffect>();
 
     GrGLSLFPFragmentBuilder* fragBuilder = args.fFragBuilder;
-    SkString coords2D = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                    bicubicEffect.sampleMatrix());
 
     /*
      * Filter weights come from Don Mitchell & Arun Netravali's 'Reconstruction Filters in Computer
@@ -58,7 +56,7 @@
     // The use of "texel" above is somewhat abstract as we're sampling a child processor. It is
     // assumed the child processor represents something akin to a nearest neighbor sampled texture.
     if (bicubicEffect.fDirection == GrBicubicEffect::Direction::kXY) {
-        fragBuilder->codeAppendf("float2 coord = %s - float2(0.5);", coords2D.c_str());
+        fragBuilder->codeAppendf("float2 coord = %s - float2(0.5);", args.fSampleCoord);
         fragBuilder->codeAppend("half2 f = half2(fract(coord));");
         fragBuilder->codeAppend("coord += 0.5 - f;");
         fragBuilder->codeAppend(
@@ -83,7 +81,7 @@
                 "half4 bicubicColor = wy.x * s0 + wy.y * s1 + wy.z * s2 + wy.w * s3;");
     } else {
         const char* d = bicubicEffect.fDirection == Direction::kX ? "x" : "y";
-        fragBuilder->codeAppendf("float coord = %s.%s - 0.5;", coords2D.c_str(), d);
+        fragBuilder->codeAppendf("float coord = %s.%s - 0.5;", args.fSampleCoord, d);
         fragBuilder->codeAppend("half f = half(fract(coord));");
         fragBuilder->codeAppend("coord += 0.5 - f;");
         fragBuilder->codeAppend("half f2 = f * f;");
@@ -92,9 +90,9 @@
         for (int i = 0; i < 4; ++i) {
             SkString coord;
             if (bicubicEffect.fDirection == Direction::kX) {
-                coord.printf("float2(coord + %d, %s.y)", i - 1, coords2D.c_str());
+                coord.printf("float2(coord + %d, %s.y)", i - 1, args.fSampleCoord);
             } else {
-                coord.printf("float2(%s.x, coord + %d)", coords2D.c_str(), i - 1);
+                coord.printf("float2(%s.x, coord + %d)", args.fSampleCoord, i - 1);
             }
             auto childStr = this->invokeChild(0, args, SkSL::String(coord.c_str(), coord.size()));
             fragBuilder->codeAppendf("c[%d] = %s;", i, childStr.c_str());
diff --git a/src/gpu/effects/GrGaussianConvolutionFragmentProcessor.cpp b/src/gpu/effects/GrGaussianConvolutionFragmentProcessor.cpp
index 3c8db34..d6fb23f 100644
--- a/src/gpu/effects/GrGaussianConvolutionFragmentProcessor.cpp
+++ b/src/gpu/effects/GrGaussianConvolutionFragmentProcessor.cpp
@@ -57,12 +57,10 @@
                                                  "Kernel", arrayCount, &kernel);
 
     GrGLSLFPFragmentBuilder* fragBuilder = args.fFragBuilder;
-    auto coords2D = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                ce.sampleMatrix());
 
     fragBuilder->codeAppendf("%s = half4(0, 0, 0, 0);", args.fOutputColor);
 
-    fragBuilder->codeAppendf("float2 coord = %s - %d.0 * %s;", coords2D.c_str(), ce.fRadius, inc);
+    fragBuilder->codeAppendf("float2 coord = %s - %d.0 * %s;", args.fSampleCoord, ce.fRadius, inc);
     fragBuilder->codeAppend("float2 coordSampled = half2(0, 0);");
 
     // Manually unroll loop because some drivers don't; yields 20-30% speedup.
diff --git a/src/gpu/effects/GrMatrixConvolutionEffect.cpp b/src/gpu/effects/GrMatrixConvolutionEffect.cpp
index 04036d0..98b88a1 100644
--- a/src/gpu/effects/GrMatrixConvolutionEffect.cpp
+++ b/src/gpu/effects/GrMatrixConvolutionEffect.cpp
@@ -223,10 +223,8 @@
     const char* bias = uniformHandler->getUniformCStr(fBiasUni);
 
     GrGLSLFPFragmentBuilder* fragBuilder = args.fFragBuilder;
-    SkString coords2D = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                    mce.sampleMatrix());
     fragBuilder->codeAppend("half4 sum = half4(0, 0, 0, 0);");
-    fragBuilder->codeAppendf("float2 coord = %s - %s;", coords2D.c_str(), kernelOffset);
+    fragBuilder->codeAppendf("float2 coord = %s - %s;", args.fSampleCoord, kernelOffset);
 
     if (mce.kernelIsSampled()) {
         this->emitKernelBlock(args, {});
@@ -244,7 +242,7 @@
         fragBuilder->codeAppendf("%s.rgb = clamp(%s.rgb, 0.0, %s.a);",
                                  args.fOutputColor, args.fOutputColor, args.fOutputColor);
     } else {
-        auto sample = this->invokeChild(0, args, coords2D.c_str());
+        auto sample = this->invokeChild(0, args);
         fragBuilder->codeAppendf("half4 c = %s;", sample.c_str());
         fragBuilder->codeAppendf("%s.a = c.a;", args.fOutputColor);
         fragBuilder->codeAppendf("%s.rgb = saturate(sum.rgb * %s + %s);", args.fOutputColor, gain, bias);
diff --git a/src/gpu/effects/GrMatrixEffect.cpp b/src/gpu/effects/GrMatrixEffect.cpp
index aa5cd3c..e147ff4 100644
--- a/src/gpu/effects/GrMatrixEffect.cpp
+++ b/src/gpu/effects/GrMatrixEffect.cpp
@@ -21,7 +21,7 @@
     void emitCode(EmitArgs& args) override {
         fMatrixVar = args.fUniformHandler->addUniform(&args.fFp, kFragment_GrShaderFlag,
                                                       kFloat3x3_GrSLType, "matrix");
-        SkString child = this->invokeChild(0, args.fInputColor, args);
+        SkString child = this->invokeChildWithMatrix(0, args.fInputColor, args);
         args.fFragBuilder->codeAppendf("%s = %s;\n", args.fOutputColor, child.c_str());
     }
 
diff --git a/src/gpu/effects/GrMatrixEffect.h b/src/gpu/effects/GrMatrixEffect.h
index 69fd1be..edbe837 100644
--- a/src/gpu/effects/GrMatrixEffect.h
+++ b/src/gpu/effects/GrMatrixEffect.h
@@ -38,7 +38,9 @@
             : INHERITED(kGrMatrixEffect_ClassID, kNone_OptimizationFlags)
             , fMatrix(matrix) {
         SkASSERT(child);
-        this->registerChild(std::move(child), SkSL::SampleMatrix::MakeConstUniform("matrix"));
+        this->registerChild(std::move(child),
+                            SkSL::SampleMatrix::MakeConstUniform(
+                                    "matrix", matrix.hasPerspective()));
     }
 
     GrGLSLFragmentProcessor* onCreateGLSLInstance() const override;
diff --git a/src/gpu/effects/GrSkSLFP.cpp b/src/gpu/effects/GrSkSLFP.cpp
index b2ed34f..aaaaa98 100644
--- a/src/gpu/effects/GrSkSLFP.cpp
+++ b/src/gpu/effects/GrSkSLFP.cpp
@@ -52,6 +52,7 @@
                                                                        fUniformHandles[arg.fIndex]);
                                 break;
                             case SkSL::Compiler::FormatArg::Kind::kChildProcessor: {
+                                // FIXME - Must use invokeChildWithMatrix depending on arg type.
                                 SkSL::String coords = this->expandFormatArgs(arg.fCoords, args,
                                                                              fmtArg, coordsName);
                                 result += this->invokeChild(arg.fIndex, args, coords).c_str();
@@ -88,14 +89,13 @@
             }
         }
         GrGLSLFPFragmentBuilder* fragBuilder = args.fFragBuilder;
-        SkASSERT(args.fTransformedCoords.count() == 1);
-        SkString coords = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                      fp.sampleMatrix());
+        SkString coords(args.fSampleCoord);
         std::vector<SkString> childNames;
         // We need to ensure that we call invokeChild on each child FP at least once.
         // Any child FP that isn't sampled won't trigger a call otherwise, leading to asserts later.
         for (int i = 0; i < this->numChildProcessors(); ++i) {
-            (void)this->invokeChild(i, args, SkSL::String("_coords"));
+            // FIXME this could have side effects; need a better way to register child functions
+            (void)this->invokeChild(i, args);
         }
         for (const auto& f : fArgs.fFunctions) {
             fFunctionNames.emplace_back();
@@ -180,7 +180,9 @@
         , fEffect(std::move(effect))
         , fName(name)
         , fInputs(std::move(inputs)) {
-    this->addCoordTransform(&fCoordTransform);
+    if (fEffect->usesSampleCoords()) {
+        this->setUsesSampleCoordsDirectly();
+    }
 }
 
 GrSkSLFP::GrSkSLFP(const GrSkSLFP& other)
@@ -190,7 +192,9 @@
         , fEffect(other.fEffect)
         , fName(other.fName)
         , fInputs(other.fInputs) {
-    this->addCoordTransform(&fCoordTransform);
+    if (fEffect->usesSampleCoords()) {
+        this->setUsesSampleCoordsDirectly();
+    }
 }
 
 const char* GrSkSLFP::name() const {
diff --git a/src/gpu/effects/GrSkSLFP.h b/src/gpu/effects/GrSkSLFP.h
index 8dc9e30..e3feadd 100644
--- a/src/gpu/effects/GrSkSLFP.h
+++ b/src/gpu/effects/GrSkSLFP.h
@@ -98,8 +98,6 @@
     const char*            fName;
     sk_sp<SkData>          fInputs;
 
-    GrCoordTransform fCoordTransform;
-
     GR_DECLARE_FRAGMENT_PROCESSOR_TEST
 
     typedef GrFragmentProcessor INHERITED;
diff --git a/src/gpu/effects/GrTextureEffect.cpp b/src/gpu/effects/GrTextureEffect.cpp
index 5113cbf..4159250 100644
--- a/src/gpu/effects/GrTextureEffect.cpp
+++ b/src/gpu/effects/GrTextureEffect.cpp
@@ -295,27 +295,8 @@
     public:
         void emitCode(EmitArgs& args) override {
             auto& te = args.fFp.cast<GrTextureEffect>();
-            SkString coords;
-            if (args.fFp.isSampledWithExplicitCoords()) {
-                coords = "_coords";
-            } else {
-                coords = args.fTransformedCoords[0].fVaryingPoint.c_str();
-            }
             auto* fb = args.fFragBuilder;
-            if (te.sampleMatrix().fKind == SkSL::SampleMatrix::Kind::kMixed) {
-                // FIXME this is very similar to the extra logic in
-                // GrGLSLFragmentShaderBuilder::ensureCoords2D
-                args.fUniformHandler->writeUniformMappings(te.sampleMatrix().fOwner, fb);
-                SkString coords2D;
-                coords2D.printf("%s_teSample", coords.c_str());
 
-                fb->codeAppendf("float3 %s_3d = %s * _matrix * %s.xy1;\n",
-                                coords2D.c_str(), te.sampleMatrix().fExpression.c_str(),
-                                coords.c_str());
-                fb->codeAppendf("float2 %s = %s_3d.xy / %s_3d.z;\n",
-                                coords2D.c_str(), coords2D.c_str(), coords2D.c_str());
-                coords = coords2D;
-            }
             if (te.fShaderModes[0] == ShaderMode::kNone &&
                 te.fShaderModes[1] == ShaderMode::kNone) {
                 fb->codeAppendf("%s = ", args.fOutputColor);
@@ -325,11 +306,11 @@
                                                                 kFloat4_GrSLType, "norm", &norm);
                     fb->appendTextureLookupAndBlend(args.fInputColor, SkBlendMode::kModulate,
                                                     args.fTexSamplers[0],
-                                                    SkStringPrintf("%s * %s.zw", coords.c_str(),
+                                                    SkStringPrintf("%s * %s.zw", args.fSampleCoord,
                                                                    norm).c_str());
                 } else {
                     fb->appendTextureLookupAndBlend(args.fInputColor, SkBlendMode::kModulate,
-                                                    args.fTexSamplers[0], coords.c_str());
+                                                    args.fTexSamplers[0], args.fSampleCoord);
                 }
                 fb->codeAppendf(";");
             } else {
@@ -357,10 +338,7 @@
                 //    filtering do a hard less than/greater than test with the subset rect.
 
                 // Convert possible projective texture coordinates into non-homogeneous half2.
-                fb->codeAppendf(
-                        "float2 inCoord = %s;",
-                        fb->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                           te.sampleMatrix()).c_str());
+                fb->codeAppendf("float2 inCoord = %s;", args.fSampleCoord);
 
                 const auto& m = te.fShaderModes;
                 GrTextureType textureType = te.fSampler.proxy()->backendFormat().textureType();
@@ -767,7 +745,6 @@
                                  const Sampling& sampling, bool lazyProxyNormalization)
         : GrFragmentProcessor(kGrTextureEffect_ClassID,
                               ModulateForSamplerOptFlags(alphaType, sampling.hasBorderAlpha()))
-        , fCoordTransform(SkMatrix::I())
         , fSampler(std::move(view), sampling.fHWSampler)
         , fSubset(sampling.fShaderSubset)
         , fClamp(sampling.fShaderClamp)
@@ -778,13 +755,12 @@
     SkASSERT(fShaderModes[0] != ShaderMode::kNone || (fSubset.fLeft == 0 && fSubset.fRight == 0));
     SkASSERT(fShaderModes[1] != ShaderMode::kNone || (fSubset.fTop == 0 && fSubset.fBottom == 0));
     this->setTextureSamplerCnt(1);
-    this->addCoordTransform(&fCoordTransform);
+    this->setUsesSampleCoordsDirectly();
     std::copy_n(sampling.fBorder, 4, fBorder);
 }
 
 GrTextureEffect::GrTextureEffect(const GrTextureEffect& src)
         : INHERITED(kGrTextureEffect_ClassID, src.optimizationFlags())
-        , fCoordTransform(src.fCoordTransform)
         , fSampler(src.fSampler)
         , fSubset(src.fSubset)
         , fClamp(src.fClamp)
@@ -792,7 +768,7 @@
         , fLazyProxyNormalization(src.fLazyProxyNormalization) {
     std::copy_n(src.fBorder, 4, fBorder);
     this->setTextureSamplerCnt(1);
-    this->addCoordTransform(&fCoordTransform);
+    this->setUsesSampleCoordsDirectly();
 }
 
 std::unique_ptr<GrFragmentProcessor> GrTextureEffect::clone() const {
diff --git a/src/gpu/effects/GrTextureEffect.h b/src/gpu/effects/GrTextureEffect.h
index 47acda8..8ce5dcf 100644
--- a/src/gpu/effects/GrTextureEffect.h
+++ b/src/gpu/effects/GrTextureEffect.h
@@ -90,7 +90,6 @@
     static ShaderMode GetShaderMode(GrSamplerState::WrapMode, GrSamplerState::Filter);
     static bool ShaderModeIsClampToBorder(ShaderMode);
 
-    GrCoordTransform fCoordTransform;
     TextureSampler fSampler;
     float fBorder[4];
     SkRect fSubset;
diff --git a/src/gpu/effects/generated/GrCircleBlurFragmentProcessor.cpp b/src/gpu/effects/generated/GrCircleBlurFragmentProcessor.cpp
index 67449dd..ac0a515 100644
--- a/src/gpu/effects/generated/GrCircleBlurFragmentProcessor.cpp
+++ b/src/gpu/effects/generated/GrCircleBlurFragmentProcessor.cpp
@@ -315,8 +315,8 @@
                 R"SkSL(
 half4 inputColor = %s;)SkSL",
                 _sample13945.c_str());
-        SkString _sample14005;
         SkString _coords14005("float2(half2(dist, 0.5))");
+        SkString _sample14005;
         _sample14005 = this->invokeChild(_outer.blurProfile_index, args, _coords14005.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
diff --git a/src/gpu/effects/generated/GrDeviceSpaceEffect.cpp b/src/gpu/effects/generated/GrDeviceSpaceEffect.cpp
index 4fec9b1..810469b 100644
--- a/src/gpu/effects/generated/GrDeviceSpaceEffect.cpp
+++ b/src/gpu/effects/generated/GrDeviceSpaceEffect.cpp
@@ -31,8 +31,8 @@
                 R"SkSL(float3 p = %s * float3(sk_FragCoord.xy, 1);)SkSL",
                 args.fUniformHandler->getUniformCStr(matrixVar));
         SkString _input276(args.fInputColor);
-        SkString _sample276;
         SkString _coords276("p.xy / p.z");
+        SkString _sample276;
         _sample276 =
                 this->invokeChild(_outer.fp_index, _input276.c_str(), args, _coords276.c_str());
         fragBuilder->codeAppendf(
diff --git a/src/gpu/effects/generated/GrMagnifierEffect.cpp b/src/gpu/effects/generated/GrMagnifierEffect.cpp
index acc344d..24e3107 100644
--- a/src/gpu/effects/generated/GrMagnifierEffect.cpp
+++ b/src/gpu/effects/generated/GrMagnifierEffect.cpp
@@ -47,8 +47,6 @@
                                                         kFloat_GrSLType, "yInvInset");
         offsetVar = args.fUniformHandler->addUniform(&_outer, kFragment_GrShaderFlag,
                                                      kHalf2_GrSLType, "offset");
-        SkString sk_TransformedCoords2D_0 = fragBuilder->ensureCoords2D(
-                args.fTransformedCoords[0].fVaryingPoint, _outer.sampleMatrix());
         fragBuilder->codeAppendf(
                 R"SkSL(float2 coord = %s;
 float2 zoom_coord = float2(%s) + coord * float2(%s, %s);
@@ -65,15 +63,15 @@
     float2 delta_squared = delta * delta;
     weight = min(min(delta_squared.x, delta_squared.y), 1.0);
 })SkSL",
-                sk_TransformedCoords2D_0.c_str(), args.fUniformHandler->getUniformCStr(offsetVar),
+                args.fSampleCoord, args.fUniformHandler->getUniformCStr(offsetVar),
                 args.fUniformHandler->getUniformCStr(xInvZoomVar),
                 args.fUniformHandler->getUniformCStr(yInvZoomVar),
                 args.fUniformHandler->getUniformCStr(boundsUniformVar),
                 args.fUniformHandler->getUniformCStr(boundsUniformVar),
                 args.fUniformHandler->getUniformCStr(xInvInsetVar),
                 args.fUniformHandler->getUniformCStr(yInvInsetVar));
-        SkString _sample1112;
         SkString _coords1112("mix(coord, zoom_coord, weight)");
+        SkString _sample1112;
         _sample1112 = this->invokeChild(_outer.src_index, args, _coords1112.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
@@ -147,6 +145,7 @@
         , yInvInset(src.yInvInset) {
     { src_index = this->cloneAndRegisterChildProcessor(src.childProcessor(src.src_index)); }
     this->addCoordTransform(&fCoordTransform0);
+    this->setUsesSampleCoordsDirectly();
 }
 std::unique_ptr<GrFragmentProcessor> GrMagnifierEffect::clone() const {
     return std::unique_ptr<GrFragmentProcessor>(new GrMagnifierEffect(*this));
diff --git a/src/gpu/effects/generated/GrMagnifierEffect.h b/src/gpu/effects/generated/GrMagnifierEffect.h
index aff287f..3723be4 100644
--- a/src/gpu/effects/generated/GrMagnifierEffect.h
+++ b/src/gpu/effects/generated/GrMagnifierEffect.h
@@ -57,6 +57,7 @@
             , yInvZoom(yInvZoom)
             , xInvInset(xInvInset)
             , yInvInset(yInvInset) {
+        this->setUsesSampleCoordsDirectly();
         SkASSERT(src);
         src_index = this->registerExplicitlySampledChild(std::move(src));
         this->addCoordTransform(&fCoordTransform0);
diff --git a/src/gpu/effects/generated/GrRRectBlurEffect.cpp b/src/gpu/effects/generated/GrRRectBlurEffect.cpp
index 968ab1e..45fa428 100644
--- a/src/gpu/effects/generated/GrRRectBlurEffect.cpp
+++ b/src/gpu/effects/generated/GrRRectBlurEffect.cpp
@@ -105,8 +105,8 @@
                 R"SkSL(
 half4 inputColor = %s;)SkSL",
                 _sample9604.c_str());
-        SkString _sample9664;
         SkString _coords9664("float2(texCoord)");
+        SkString _sample9664;
         _sample9664 = this->invokeChild(_outer.ninePatchFP_index, args, _coords9664.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
diff --git a/src/gpu/effects/generated/GrRectBlurEffect.cpp b/src/gpu/effects/generated/GrRectBlurEffect.cpp
index 689c69b..20ca269 100644
--- a/src/gpu/effects/generated/GrRectBlurEffect.cpp
+++ b/src/gpu/effects/generated/GrRectBlurEffect.cpp
@@ -53,15 +53,15 @@
                 rectFVar.isValid() ? args.fUniformHandler->getUniformCStr(rectFVar) : "float4(0)",
                 rectHVar.isValid() ? args.fUniformHandler->getUniformCStr(rectHVar) : "half4(0)",
                 rectHVar.isValid() ? args.fUniformHandler->getUniformCStr(rectHVar) : "half4(0)");
-        SkString _sample7215;
         SkString _coords7215("float2(half2(xy.x, 0.5))");
+        SkString _sample7215;
         _sample7215 = this->invokeChild(_outer.integral_index, args, _coords7215.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
     xCoverage = %s.w;)SkSL",
                 _sample7215.c_str());
-        SkString _sample7273;
         SkString _coords7273("float2(half2(xy.y, 0.5))");
+        SkString _sample7273;
         _sample7273 = this->invokeChild(_outer.integral_index, args, _coords7273.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
@@ -80,21 +80,21 @@
                 rectFVar.isValid() ? args.fUniformHandler->getUniformCStr(rectFVar) : "float4(0)",
                 rectHVar.isValid() ? args.fUniformHandler->getUniformCStr(rectHVar) : "half4(0)",
                 rectHVar.isValid() ? args.fUniformHandler->getUniformCStr(rectHVar) : "half4(0)");
-        SkString _sample8640;
         SkString _coords8640("float2(half2(rect.x, 0.5))");
+        SkString _sample8640;
         _sample8640 = this->invokeChild(_outer.integral_index, args, _coords8640.c_str());
-        SkString _sample8703;
         SkString _coords8703("float2(half2(rect.z, 0.5))");
+        SkString _sample8703;
         _sample8703 = this->invokeChild(_outer.integral_index, args, _coords8703.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
     xCoverage = (1.0 - %s.w) - %s.w;)SkSL",
                 _sample8640.c_str(), _sample8703.c_str());
-        SkString _sample8767;
         SkString _coords8767("float2(half2(rect.y, 0.5))");
+        SkString _sample8767;
         _sample8767 = this->invokeChild(_outer.integral_index, args, _coords8767.c_str());
-        SkString _sample8830;
         SkString _coords8830("float2(half2(rect.w, 0.5))");
+        SkString _sample8830;
         _sample8830 = this->invokeChild(_outer.integral_index, args, _coords8830.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
diff --git a/src/gpu/gl/GrGLUniformHandler.h b/src/gpu/gl/GrGLUniformHandler.h
index 9263cab..607835b 100644
--- a/src/gpu/gl/GrGLUniformHandler.h
+++ b/src/gpu/gl/GrGLUniformHandler.h
@@ -34,6 +34,9 @@
     UniformInfo& uniform(int idx) override {
         return fUniforms.item(idx);
     }
+    const UniformInfo& uniform(int idx) const override {
+        return fUniforms.item(idx);
+    }
 
 private:
     explicit GrGLUniformHandler(GrGLSLProgramBuilder* program)
diff --git a/src/gpu/glsl/GrGLSLFragmentProcessor.cpp b/src/gpu/glsl/GrGLSLFragmentProcessor.cpp
index 90f50c2..dc42cf9 100644
--- a/src/gpu/glsl/GrGLSLFragmentProcessor.cpp
+++ b/src/gpu/glsl/GrGLSLFragmentProcessor.cpp
@@ -24,33 +24,13 @@
         fFunctionNames.emplace_back();
     }
 
-    // Subtle bug workaround: If an FP (this) has a child, and wishes to sample it, but does not
-    // want to *force* explicit coord sampling, then the obvious solution is to call it with
-    // invokeChild and no coords. However, if this FP is then adopted as a child of another FP that
-    // does want to sample with explicit coords, that property is propagated (recursively) to all
-    // children, and we need to supply explicit coords. So we propagate our own "_coords" (this is
-    // the name of our explicit coords parameter generated in the helper function).
-    if (args.fFp.isSampledWithExplicitCoords() && skslCoords.length() == 0) {
-        skslCoords = "_coords";
+    if (skslCoords.empty()) {
+        // Empty coords means passing through the coords of the parent
+        skslCoords = args.fSampleCoord;
     }
 
     const GrFragmentProcessor& childProc = args.fFp.childProcessor(childIndex);
 
-    // If the fragment processor is invoked with overridden coordinates, it must *always* be invoked
-    // with overridden coords.
-    SkASSERT(childProc.isSampledWithExplicitCoords() == !skslCoords.empty());
-
-    if (skslCoords.length() == 0) {
-        switch (childProc.sampleMatrix().fKind) {
-            case SkSL::SampleMatrix::Kind::kMixed:
-            case SkSL::SampleMatrix::Kind::kVariable:
-                skslCoords = "_matrix";
-                break;
-            default:
-                break;
-        }
-    }
-
     // Emit the child's helper function if this is the first time we've seen a call
     if (fFunctionNames[childIndex].size() == 0) {
         TransformedCoordVars coordVars = args.fTransformedCoords.childInputs(childIndex);
@@ -62,20 +42,28 @@
                            childProc,
                            "_output",
                            "_input",
+                           "_coords",
                            coordVars,
                            textureSamplers);
         fFunctionNames[childIndex] =
                 fragBuilder->writeProcessorFunction(this->childProcessor(childIndex), childArgs);
     }
 
-    // Produce a string containing the call to the helper function
-    SkString result = SkStringPrintf("%s(%s", fFunctionNames[childIndex].c_str(),
-                                              inputColor ? inputColor : "half4(1)");
-    if (skslCoords.length()) {
-        result.appendf(", %s", skslCoords.c_str());
+    if (childProc.isSampledWithExplicitCoords()) {
+        // The child's function takes a half4 color and a float2 coordinate
+        return SkStringPrintf("%s(%s, %s)", fFunctionNames[childIndex].c_str(),
+                                            inputColor ? inputColor : "half4(1)",
+                                            skslCoords.c_str());
+    } else {
+        // The child's function just takes a color; we should only get here for a call to
+        // sample(color) without explicit coordinates, so assert that the child has no sample matrix
+        // and skslCoords is _coords (a const/uniform sample call would go through
+        // invokeChildWithMatrix, and if a child was sampled with sample(matrix) and sample(), it
+        // should have been flagged as variable and hit the branch above).
+        SkASSERT(skslCoords == args.fSampleCoord && childProc.sampleMatrix().isNoOp());
+        return SkStringPrintf("%s(%s)", fFunctionNames[childIndex].c_str(),
+                                        inputColor ? inputColor : "half4(1)");
     }
-    result.append(")");
-    return result;
 }
 
 SkString GrGLSLFragmentProcessor::invokeChildWithMatrix(int childIndex, const char* inputColor,
@@ -99,16 +87,73 @@
                            childProc,
                            "_output",
                            "_input",
+                           "_coords",
                            coordVars,
                            textureSamplers);
         fFunctionNames[childIndex] =
                 fragBuilder->writeProcessorFunction(this->childProcessor(childIndex), childArgs);
     }
 
-    // Produce a string containing the call to the helper function
-    return SkStringPrintf("%s(%s, %s)", fFunctionNames[childIndex].c_str(),
-                                        inputColor ? inputColor : "half4(1)",
-                                        skslMatrix.c_str());
+    // Since this is const/uniform, the provided sksl expression should exactly match the
+    // expression stored on the FP, or it should match the mangled uniform name.
+    if (skslMatrix.empty()) {
+        // Empty matrix expression replaces with the sampleMatrix expression stored on the FP, but
+        // that is only valid for const/uniform sampled FPs
+        SkASSERT(childProc.sampleMatrix().isConstUniform());
+        skslMatrix = childProc.sampleMatrix().fExpression;
+    }
+
+    if (childProc.sampleMatrix().isConstUniform()) {
+        // Attempt to resolve the uniform name from the raw name that was stored in the sample
+        // matrix. Since this is const/uniform, the provided expression better match what was given
+        // to the FP.
+        SkASSERT(childProc.sampleMatrix().fExpression == skslMatrix);
+        GrShaderVar uniform = args.fUniformHandler->getUniformMapping(
+                args.fFp, childProc.sampleMatrix().fExpression);
+        if (uniform.getType() != kVoid_GrSLType) {
+            // Found the uniform, so replace the expression with the actual uniform name
+            SkASSERT(uniform.getType() == kFloat3x3_GrSLType);
+            skslMatrix = uniform.getName().c_str();
+        } // else assume it's a constant expression
+    }
+
+    // Produce a string containing the call to the helper function. sample(matrix) is special where
+    // the provided skslMatrix expression means that the child FP should be invoked with coords
+    // equal to matrix * parent coords. However, if matrix is a constant/uniform AND the parent
+    // coords were produced by const/uniform transforms, then this expression is lifted to a vertex
+    // shader and is stored in a varying. In that case, childProc will not have a variable sample
+    // matrix and will not be sampled explicitly, so its function signature will not take in coords.
+    //
+    // In all other cases, we need to insert sksl to compute matrix * parent coords and then invoke
+    // the function.
+    if (childProc.isSampledWithExplicitCoords()) {
+        SkASSERT(!childProc.sampleMatrix().isNoOp());
+        // Only check perspective for this specific matrix transform, not the aggregate FP property.
+        // Any parent perspective will have already been applied when evaluated in the FS.
+        if (childProc.sampleMatrix().fHasPerspective) {
+            SkString coords3 = fragBuilder->newTmpVarName("coords3");
+            fragBuilder->codeAppendf("float3 %s = (%s) * %s.xy1;\n",
+                                     coords3.c_str(), skslMatrix.c_str(), args.fSampleCoord);
+            return SkStringPrintf("%s(%s, %s.xy / %s.z)",
+                                  fFunctionNames[childIndex].c_str(),
+                                  inputColor ? inputColor : "half4(1)",
+                                  coords3.c_str(), coords3.c_str());
+        } else {
+            return SkStringPrintf("%s(%s, ((%s) * %s.xy1).xy)",
+                                  fFunctionNames[childIndex].c_str(),
+                                  inputColor ? inputColor : "half4(1)",
+                                  skslMatrix.c_str(), args.fSampleCoord);
+        }
+    } else {
+        // A variable matrix expression should mark the child as explicitly sampled. A no-op
+        // matrix should match sample(color), not sample(color, matrix).
+        SkASSERT(childProc.sampleMatrix().isConstUniform());
+
+        // Since this is const/uniform and not explicitly sampled, it's transform has been
+        // promoted to the vertex shader and the signature doesn't take a float2 coord.
+        return SkStringPrintf("%s(%s)", fFunctionNames[childIndex].c_str(),
+                                        inputColor ? inputColor : "half4(1)");
+    }
 }
 
 //////////////////////////////////////////////////////////////////////////////
diff --git a/src/gpu/glsl/GrGLSLFragmentProcessor.h b/src/gpu/glsl/GrGLSLFragmentProcessor.h
index 88f3fa7..b68d09b 100644
--- a/src/gpu/glsl/GrGLSLFragmentProcessor.h
+++ b/src/gpu/glsl/GrGLSLFragmentProcessor.h
@@ -93,6 +93,7 @@
                                  (e.g. input color is solid white, trans black, known to be opaque,
                                  etc.) that allows the processor to communicate back similar known
                                  info about its output.
+        @param localCoord        The name of a local coord reference to a float2 variable.
         @param transformedCoords Fragment shader variables containing the coords computed using
                                  each of the GrFragmentProcessor's GrCoordTransforms.
         @param texSamplers       Contains one entry for each TextureSampler  of the GrProcessor.
@@ -106,6 +107,7 @@
                  const GrFragmentProcessor& fp,
                  const char* outputColor,
                  const char* inputColor,
+                 const char* sampleCoord,
                  const TransformedCoordVars& transformedCoordVars,
                  const TextureSamplers& textureSamplers)
                 : fFragBuilder(fragBuilder)
@@ -114,6 +116,7 @@
                 , fFp(fp)
                 , fOutputColor(outputColor)
                 , fInputColor(inputColor ? inputColor : "half4(1.0)")
+                , fSampleCoord(sampleCoord)
                 , fTransformedCoords(transformedCoordVars)
                 , fTexSamplers(textureSamplers) {}
         GrGLSLFPFragmentBuilder* fFragBuilder;
@@ -122,6 +125,7 @@
         const GrFragmentProcessor& fFp;
         const char* fOutputColor;
         const char* fInputColor;
+        const char* fSampleCoord;
         const TransformedCoordVars& fTransformedCoords;
         const TextureSamplers& fTexSamplers;
     };
@@ -143,7 +147,7 @@
     }
 
     inline SkString invokeChildWithMatrix(int childIndex, EmitArgs& parentArgs,
-                                          SkSL::String skslMatrix) {
+                                          SkSL::String skslMatrix = "") {
         return this->invokeChildWithMatrix(childIndex, nullptr, parentArgs, skslMatrix);
     }
 
@@ -153,16 +157,28 @@
      *  mangled to prevent redefinitions. The returned string contains the output color (as a call
      *  to the child's helper function). It is legal to pass nullptr as inputColor, since all
      *  fragment processors are required to work without an input color.
+     *
+     *  When skslCoords is empty, invokeChild corresponds to a call to "sample(child, color)"
+     *  in SkSL. When skslCoords is not empty, invokeChild corresponds to a call to
+     *  "sample(child, color, float2)", where skslCoords is an SkSL expression that evaluates to a
+     *  float2 and is passed in as the 3rd argument.
      */
     SkString invokeChild(int childIndex, const char* inputColor, EmitArgs& parentArgs,
                          SkSL::String skslCoords = "");
 
     /**
-     * As invokeChild, but transforms the coordinates according to the provided matrix. The matrix
-     * must be a snippet of SkSL code which evaluates to a float3x3.
+     * As invokeChild, but transforms the coordinates according to the provided matrix. This variant
+     * corresponds to a call of "sample(child, color, matrix)" in SkSL, where skslMatrix is an SkSL
+     * expression that evaluates to a float3x3 and is passed in as the 3rd argument.
+     *
+     * If skslMatrix is the empty string, then it is automatically replaced with the expression
+     * attached to the child's SampleMatrix object. This is only valid if the child is sampled with
+     * a const-uniform matrix. If the sample matrix is const-or-uniform, the expression will be
+     * automatically resolved to the mangled uniform name.
      */
     SkString invokeChildWithMatrix(int childIndex, const char* inputColor, EmitArgs& parentArgs,
-                                   SkSL::String skslMatrix);
+                                   SkSL::String skslMatrix = "");
+
     /**
      * Pre-order traversal of a GLSLFP hierarchy, or of multiple trees with roots in an array of
      * GLSLFPS. If initialized with an array color followed by coverage processors installed in a
diff --git a/src/gpu/glsl/GrGLSLFragmentShaderBuilder.cpp b/src/gpu/glsl/GrGLSLFragmentShaderBuilder.cpp
index 68ba303..6ae76f9 100644
--- a/src/gpu/glsl/GrGLSLFragmentShaderBuilder.cpp
+++ b/src/gpu/glsl/GrGLSLFragmentShaderBuilder.cpp
@@ -71,39 +71,6 @@
     fSubstageIndices.push_back(0);
 }
 
-SkString GrGLSLFragmentShaderBuilder::ensureCoords2D(const GrShaderVar& coords,
-                                                     const SkSL::SampleMatrix& matrix) {
-    SkString result;
-    if (!coords.getName().size()) {
-        result = "_coords";
-    } else if (kFloat3_GrSLType != coords.getType() && kHalf3_GrSLType != coords.getType()) {
-        SkASSERT(kFloat2_GrSLType == coords.getType() || kHalf2_GrSLType == coords.getType());
-        result = coords.getName();
-    } else {
-        SkString coords2D;
-        coords2D.printf("%s_ensure2D", coords.c_str());
-        this->codeAppendf("\tfloat2 %s = %s.xy / %s.z;", coords2D.c_str(), coords.c_str(),
-                          coords.c_str());
-        result = coords2D;
-    }
-    switch (matrix.fKind) {
-        case SkSL::SampleMatrix::Kind::kMixed:
-        case SkSL::SampleMatrix::Kind::kVariable: {
-            SkString sampleCoords2D;
-            sampleCoords2D.printf("%s_sample", coords.c_str());
-            this->codeAppendf("\tfloat3 %s_3d = _matrix * %s.xy1;\n",
-                              sampleCoords2D.c_str(), result.c_str());
-            this->codeAppendf("\tfloat2 %s = %s_3d.xy / %s_3d.z;\n",
-                              sampleCoords2D.c_str(), sampleCoords2D.c_str(),
-                              sampleCoords2D.c_str());
-            result = sampleCoords2D;
-            break; }
-        default:
-            break;
-    }
-    return result;
-}
-
 const char* GrGLSLFragmentShaderBuilder::sampleOffsets() {
     SkASSERT(CustomFeatures::kSampleLocations & fProgramBuilder->processorFeatures());
     SkDEBUGCODE(fUsedProcessorFeaturesThisStage_DebugOnly |= CustomFeatures::kSampleLocations);
@@ -180,56 +147,86 @@
                                                          GrGLSLFragmentProcessor::EmitArgs& args) {
     this->onBeforeChildProcEmitCode();
     this->nextStage();
-    bool hasVariableMatrix = args.fFp.sampleMatrix().fKind == SkSL::SampleMatrix::Kind::kVariable ||
-                             args.fFp.sampleMatrix().fKind == SkSL::SampleMatrix::Kind::kMixed;
-    if (args.fFp.isSampledWithExplicitCoords() && args.fTransformedCoords.count() > 0) {
-        // we currently only support overriding a single coordinate pair
-        SkASSERT(args.fTransformedCoords.count() == 1);
-        const GrShaderVar& transform = args.fTransformedCoords[0].fTransform;
-        switch (transform.getType()) {
-            case kFloat4_GrSLType:
-                // This is a scale+translate, so there's no perspective division needed
-                this->codeAppendf("_coords = _coords * %s.xz + %s.yw;\n", transform.c_str(),
-                                  transform.c_str());
-                break;
-            case kFloat3x3_GrSLType:
-                this->codeAppend("{\n");
-                this->codeAppendf("float3 _coords3 = (%s * _coords.xy1);\n", transform.c_str());
-                this->codeAppend("_coords = _coords3.xy / _coords3.z;\n");
-                this->codeAppend("}\n");
-                break;
-            default:
-                SkASSERT(transform.getType() == kVoid_GrSLType);
-                break;
+
+    // An FP's function signature is theoretically always main(half4 color, float2 _coords).
+    // However, if it is only sampled by a chain of const/uniform matrices (or legacy coord
+    // transforms), the value that would have been passed to _coords is lifted to the vertex shader
+    // and stored in a unique varying. In that case it uses that variable and does not have a
+    // second actual argument for _coords.
+    // FIXME: Once GrCoordTransforms are gone, and we can more easily associated this varying with
+    // the sample call site, then invokeChild() can pass the varying in, instead of requiring this
+    // dynamic signature.
+    int paramCount;
+    GrShaderVar params[] = { GrShaderVar(args.fInputColor, kHalf4_GrSLType),
+                             GrShaderVar(args.fSampleCoord, kFloat2_GrSLType) };
+
+    if (args.fFp.isSampledWithExplicitCoords()) {
+        // All invokeChild() that point to 'fp' will evaluate these expressions and pass the float2
+        // in, so we need the 2nd argument.
+        paramCount = 2;
+
+        // FIXME: This is only needed for the short term until FPs no longer put transformation
+        // data in a GrCoordTransform (and we can then mark the parameter as read-only)
+        if (args.fTransformedCoords.count() > 0) {
+            SkASSERT(args.fTransformedCoords.count() == 1);
+
+            const GrShaderVar& transform = args.fTransformedCoords[0].fTransform;
+            switch (transform.getType()) {
+                case kFloat4_GrSLType:
+                    // This is a scale+translate, so there's no perspective division needed
+                    this->codeAppendf("%s = %s * %s.xz + %s.yw;\n", args.fSampleCoord,
+                                                                    args.fSampleCoord,
+                                                                    transform.c_str(),
+                                                                    transform.c_str());
+                    break;
+                case kFloat3x3_GrSLType:
+                    this->codeAppend("{\n");
+                    this->codeAppendf("float3 _coords3 = (%s * %s.xy1);\n",
+                                      transform.c_str(), args.fSampleCoord);
+                    this->codeAppendf("%s = _coords3.xy / _coords3.z;\n", args.fSampleCoord);
+                    this->codeAppend("}\n");
+                    break;
+                default:
+                    SkASSERT(transform.getType() == kVoid_GrSLType);
+                    break;
+            }
         }
-        if (args.fFp.sampleMatrix().fKind != SkSL::SampleMatrix::Kind::kNone) {
-            SkASSERT(!hasVariableMatrix);
-            this->codeAppend("{\n");
-            args.fUniformHandler->writeUniformMappings(args.fFp.sampleMatrix().fOwner, this);
-            // FIXME This is not a variable matrix, we could key on the matrix type and skip
-            // perspective division; it may also be worth detecting if it was scale+translate and
-            // evaluating this similarly to the kFloat4 explicit coord case.
-            this->codeAppendf("float3 _coords3 = (%s * _coords.xy1);\n",
-                              args.fFp.sampleMatrix().fExpression.c_str());
-            this->codeAppend("_coords = _coords3.xy / _coords3.z;\n");
-            this->codeAppend("}\n");
+    } else {
+        // Sampled with a const/uniform matrix and/or a legacy coord transform. The actual
+        // transformation code is emitted in the vertex shader, so this only has to access it.
+        // Add a float2 _coords variable that maps to the associated varying and replaces the
+        // absent 2nd argument to the fp's function.
+        paramCount = 1;
+
+        if (args.fFp.referencesSampleCoords()) {
+            const GrShaderVar& varying = args.fTransformedCoords[0].fVaryingPoint;
+            switch(varying.getType()) {
+                case kFloat2_GrSLType:
+                    // Just point the local coords to the varying
+                    args.fSampleCoord = varying.getName().c_str();
+                    break;
+                case kFloat3_GrSLType:
+                    // Must perform the perspective divide in the frag shader based on the varying,
+                    // and since we won't actually have a function parameter for local coords, add
+                    // it as a local variable.
+                    this->codeAppendf("float2 %s = %s.xy / %s.z;\n", args.fSampleCoord,
+                                    varying.getName().c_str(), varying.getName().c_str());
+                    break;
+                default:
+                    SkDEBUGFAILF("Unexpected varying type for coord: %s %d\n",
+                                 varying.getName().c_str(), (int) varying.getType());
+                    break;
+            }
         }
     }
 
     this->codeAppendf("half4 %s;\n", args.fOutputColor);
     fp->emitCode(args);
     this->codeAppendf("return %s;\n", args.fOutputColor);
-    GrShaderVar params[] = { GrShaderVar(args.fInputColor, kHalf4_GrSLType),
-                             hasVariableMatrix ? GrShaderVar("_matrix", kFloat3x3_GrSLType)
-                                               : GrShaderVar("_coords", kFloat2_GrSLType) };
+
     SkString result;
-    this->emitFunction(kHalf4_GrSLType,
-                       args.fFp.name(),
-                       args.fFp.isSampledWithExplicitCoords() || hasVariableMatrix ? 2
-                                                                                   : 1,
-                       params,
-                       this->code().c_str(),
-                       &result);
+    this->emitFunction(kHalf4_GrSLType, args.fFp.name(), paramCount, params,
+                       this->code().c_str(), &result);
     this->deleteStage();
     this->onAfterChildProcEmitCode();
     return result;
diff --git a/src/gpu/glsl/GrGLSLFragmentShaderBuilder.h b/src/gpu/glsl/GrGLSLFragmentShaderBuilder.h
index dedfee1..d189c7a 100644
--- a/src/gpu/glsl/GrGLSLFragmentShaderBuilder.h
+++ b/src/gpu/glsl/GrGLSLFragmentShaderBuilder.h
@@ -25,14 +25,6 @@
     GrGLSLFragmentBuilder(GrGLSLProgramBuilder* program) : INHERITED(program) {}
     virtual ~GrGLSLFragmentBuilder() {}
 
-    /**
-     * This returns a variable name to access the 2D, perspective correct version of the coords in
-     * the fragment shader. The passed in coordinates must either be of type kHalf2 or kHalf3. If
-     * the coordinates are 3-dimensional, it a perspective divide into is emitted into the
-     * fragment shader (xy / z) to convert them to 2D.
-     */
-    virtual SkString ensureCoords2D(const GrShaderVar&, const SkSL::SampleMatrix& matrix) = 0;
-
     // TODO: remove this method.
     void declAppendf(const char* fmt, ...);
 
@@ -150,10 +142,6 @@
 
     GrGLSLFragmentShaderBuilder(GrGLSLProgramBuilder* program);
 
-    // Shared GrGLSLFragmentBuilder interface.
-    virtual SkString ensureCoords2D(const GrShaderVar&,
-                                    const SkSL::SampleMatrix& matrix) override;
-
     // GrGLSLFPFragmentBuilder interface.
     const char* sampleOffsets() override;
     void maskOffMultisampleCoverage(const char* mask, ScopeFlags) override;
diff --git a/src/gpu/glsl/GrGLSLGeometryProcessor.cpp b/src/gpu/glsl/GrGLSLGeometryProcessor.cpp
index 4d2ee7e..b774b18 100644
--- a/src/gpu/glsl/GrGLSLGeometryProcessor.cpp
+++ b/src/gpu/glsl/GrGLSLGeometryProcessor.cpp
@@ -74,117 +74,247 @@
                                                 GrGLSLUniformHandler* uniformHandler,
                                                 const GrShaderVar& localCoordsVar,
                                                 FPCoordTransformHandler* handler) {
-    // We only require localCoordsVar to be valid if there is a coord transform that needs
-    // it. CTs on FPs called with explicit coords do not require a local coord.
-    auto getLocalCoords = [&localCoordsVar,
-                           localCoords = SkString(),
-                           localCoordLength = int()]() mutable {
-        if (localCoords.isEmpty()) {
-            localCoordLength = GrSLTypeVecLength(localCoordsVar.getType());
-            SkASSERT(GrSLTypeIsFloatType(localCoordsVar.getType()));
-            SkASSERT(localCoordLength == 2 || localCoordLength == 3);
-            if (localCoordLength == 3) {
-                localCoords = localCoordsVar.getName();
-            } else {
-                localCoords.printf("float3(%s, 1)", localCoordsVar.c_str());
-            }
+    SkASSERT(localCoordsVar.getType() == kFloat2_GrSLType ||
+             localCoordsVar.getType() == kFloat3_GrSLType ||
+             localCoordsVar.getType() == kVoid_GrSLType /* until coord transforms are gone */);
+    // Cached varyings produced by parent FPs. If parent FPs introduce transformations, but all
+    // subsequent children are not transformed, they should share the same varying.
+    std::unordered_map<const GrFragmentProcessor*, GrShaderVar> localCoordsMap;
+
+    GrGLSLVarying baseLocalCoord;
+    auto getBaseLocalCoord = [&baseLocalCoord, &localCoordsVar, vb, varyingHandler]() {
+        SkASSERT(GrSLTypeIsFloatType(localCoordsVar.getType()));
+        if (baseLocalCoord.type() == kVoid_GrSLType) {
+            // Initialize to the GP provided coordinate
+            SkString baseLocalCoordName = SkStringPrintf("LocalCoord");
+            baseLocalCoord = GrGLSLVarying(localCoordsVar.getType());
+            varyingHandler->addVarying(baseLocalCoordName.c_str(), &baseLocalCoord);
+            vb->codeAppendf("%s = %s;\n", baseLocalCoord.vsOut(),
+                            localCoordsVar.getName().c_str());
         }
-        return std::make_tuple(localCoords, localCoordLength);
+        return GrShaderVar(SkString(baseLocalCoord.fsIn()), baseLocalCoord.type(),
+                           GrShaderVar::TypeModifier::In);
     };
 
-    GrShaderVar transformVar;
     for (int i = 0; *handler; ++*handler, ++i) {
         auto [coordTransform, fp] = handler->get();
-        // Add uniform for coord transform matrix.
-        SkString matrix;
-        if (!fp.isSampledWithExplicitCoords() || !coordTransform.isNoOp()) {
+
+        // FPs that use the legacy coord transform system will need a uniform registered for them
+        // to hold the coord transform's matrix.
+        GrShaderVar transformVar;
+        // FPs that use local coordinates need a varying to convey the coordinate. This may be the
+        // base GP's local coord if transforms have to be computed in the FS, or it may be a unique
+        // varying that computes the equivalent transformation hierarchy in the VS.
+        GrShaderVar varyingVar;
+
+        // If this is true, the FP's signature takes a float2 local coordinate. Otherwise, it
+        // doesn't use local coordinates, or it can be lifted to a varying and referenced directly.
+        bool localCoordComputedInFS = fp.isSampledWithExplicitCoords();
+        if (!coordTransform.isNoOp()) {
+            // Legacy coord transform that actually is doing something. This matrix is the last
+            // transformation to affect the local coordinate.
             SkString strUniName;
             strUniName.printf("CoordTransformMatrix_%d", i);
-            auto flag = fp.isSampledWithExplicitCoords() ? kFragment_GrShaderFlag
-                                                         : kVertex_GrShaderFlag;
+            auto flag = localCoordComputedInFS ? kFragment_GrShaderFlag
+                                               : kVertex_GrShaderFlag;
             auto& uni = fInstalledTransforms.push_back();
             if (fp.isSampledWithExplicitCoords() && coordTransform.matrix().isScaleTranslate()) {
                 uni.fType = kFloat4_GrSLType;
             } else {
                 uni.fType = kFloat3x3_GrSLType;
             }
-            const char* matrixName;
             uni.fHandle =
-                    uniformHandler->addUniform(&fp, flag, uni.fType, strUniName.c_str(),
-                                               &matrixName);
-            matrix = matrixName;
+                    uniformHandler->addUniform(&fp, flag, uni.fType, strUniName.c_str());
             transformVar = uniformHandler->getUniformVariable(uni.fHandle);
         } else {
-            // Install a coord transform that will be skipped.
+            // Must stay parallel with calls to handler
             fInstalledTransforms.push_back();
-            handler->omitCoordsForCurrCoordTransform();
-            continue;
         }
 
-        GrShaderVar fsVar;
-        // Add varying if required and register varying and matrix uniform.
-        if (!fp.isSampledWithExplicitCoords()) {
-            auto [localCoordsStr, localCoordLength] = getLocalCoords();
-            GrGLSLVarying v(kFloat2_GrSLType);
-            if (coordTransform.matrix().hasPerspective() || localCoordLength == 3) {
-                v = GrGLSLVarying(kFloat3_GrSLType);
-            }
-            SkString strVaryingName;
-            strVaryingName.printf("TransformedCoords_%d", i);
-            varyingHandler->addVarying(strVaryingName.c_str(), &v);
+        // If the FP references local coords, we need to make sure the vertex shader sets up the
+        // right transforms or pass-through variables for the FP to evaluate in the fragment shader
+        if (fp.referencesSampleCoords()) {
+            if (localCoordComputedInFS) {
+                // If the FP local coords are evaluated in the fragment shader, we only need to
+                // produce the original local coordinate to pass into the root; any other situation,
+                // the FP will have a 2nd parameter to its function and the caller sends the coords
+                if (!fp.parent()) {
+                    varyingVar = getBaseLocalCoord();
+                }
+            } else {
+                // The FP's local coordinates are determined by the const/uniform transform
+                // hierarchy from this FP to the root, and can be computed in the vertex shader.
+                // If this hierarchy would be the identity transform, then we should use the
+                // original local coordinate.
+                // NOTE: The actual transform logic is handled in emitTransformCode(), this just
+                // needs to determine if a unique varying should be added for the FP.
+                GrShaderVar transformedLocalCoord;
+                const GrFragmentProcessor* coordOwner = nullptr;
 
-            SkASSERT(fInstalledTransforms.back().fType == kFloat3x3_GrSLType);
-            if (fp.sampleMatrix().fKind != SkSL::SampleMatrix::Kind::kConstantOrUniform) {
-                if (v.type() == kFloat2_GrSLType) {
-                    vb->codeAppendf("%s = (%s * %s).xy;", v.vsOut(), matrix.c_str(),
-                                    localCoordsStr.c_str());
+                const GrFragmentProcessor* node = &fp;
+                while(node) {
+                    SkASSERT(!node->isSampledWithExplicitCoords() &&
+                             (node->sampleMatrix().isNoOp() ||
+                              node->sampleMatrix().isConstUniform()));
+
+                    if (node->sampleMatrix().isConstUniform()) {
+                        // We can stop once we hit an FP that adds transforms; this FP can reuse
+                        // that FPs varying (possibly vivifying it if this was the first use).
+                        transformedLocalCoord = localCoordsMap[node];
+                        coordOwner = node;
+                        break;
+                    } // else intervening FP is an identity transform so skip past it
+
+                    node = node->parent();
+                }
+
+                // Legacy coord transform workaround (if the transform hierarchy appears identity
+                // but we have GrCoordTransform that does something, we still need to record a
+                // varying for it).
+                if (!coordOwner && !coordTransform.isNoOp()) {
+                    coordOwner = &fp;
+                }
+
+                if (coordOwner) {
+                    // The FP will use coordOwner's varying; add varying if this was the first use
+                    if (transformedLocalCoord.getType() == kVoid_GrSLType) {
+                        GrGLSLVarying v(kFloat2_GrSLType);
+                        if (coordTransform.matrix().hasPerspective() ||
+                            GrSLTypeVecLength(localCoordsVar.getType()) == 3 ||
+                            coordOwner->hasPerspectiveTransform()) {
+                            v = GrGLSLVarying(kFloat3_GrSLType);
+                        }
+                        SkString strVaryingName;
+                        strVaryingName.printf("TransformedCoords_%d", i);
+                        varyingHandler->addVarying(strVaryingName.c_str(), &v);
+
+                        fTransformInfos.push_back({GrShaderVar(v.vsOut(), v.type()),
+                                                   transformVar.getName(),
+                                                   localCoordsVar,
+                                                   coordOwner});
+                        transformedLocalCoord = GrShaderVar(SkString(v.fsIn()), v.type(),
+                                                            GrShaderVar::TypeModifier::In);
+                        if (coordOwner->numCoordTransforms() < 1 ||
+                            coordOwner->coordTransform(0).isNoOp()) {
+                            // As long as a legacy coord transform doesn't get in the way, we can
+                            // reuse this expression for children (see comment in emitTransformCode)
+                            localCoordsMap[coordOwner] = transformedLocalCoord;
+                        }
+                    }
+
+                    varyingVar = transformedLocalCoord;
                 } else {
-                    vb->codeAppendf("%s = %s * %s;", v.vsOut(), matrix.c_str(),
-                                    localCoordsStr.c_str());
+                    // The FP transform hierarchy is the identity, so use the original local coord
+                    varyingVar = getBaseLocalCoord();
                 }
             }
-            fsVar = GrShaderVar(SkString(v.fsIn()), v.type(), GrShaderVar::TypeModifier::In);
-            fTransformInfos.push_back({ v.vsOut(), v.type(), matrix, localCoordsStr, &fp });
         }
-        handler->specifyCoordsForCurrCoordTransform(transformVar, fsVar);
+
+        if (varyingVar.getType() != kVoid_GrSLType || transformVar.getType() != kVoid_GrSLType) {
+            handler->specifyCoordsForCurrCoordTransform(transformVar, varyingVar);
+        } else {
+            handler->omitCoordsForCurrCoordTransform();
+        }
     }
 }
 
 void GrGLSLGeometryProcessor::emitTransformCode(GrGLSLVertexBuilder* vb,
                                                 GrGLSLUniformHandler* uniformHandler) {
-    std::unordered_map<const GrFragmentProcessor*, const char*> localCoordsMap;
+    std::unordered_map<const GrFragmentProcessor*, GrShaderVar> localCoordsMap;
     for (const auto& tr : fTransformInfos) {
-        switch (tr.fFP->sampleMatrix().fKind) {
-            case SkSL::SampleMatrix::Kind::kConstantOrUniform: {
-                SkString localCoords;
-                localCoordsMap.insert({ tr.fFP, tr.fName });
-                if (tr.fFP->sampleMatrix().fBase) {
-                    SkASSERT(localCoordsMap[tr.fFP->sampleMatrix().fBase]);
-                    localCoords = SkStringPrintf("float3(%s, 1)",
-                                                 localCoordsMap[tr.fFP->sampleMatrix().fBase]);
+        // If we recorded a transform info, its sample matrix must be const/uniform, or we have a
+        // legacy coord transform that actually does something.
+        SkASSERT(tr.fFP->sampleMatrix().isConstUniform() ||
+                 (tr.fFP->sampleMatrix().isNoOp() && !tr.fMatrix.isEmpty()));
+
+        SkString localCoords;
+        // Build a concatenated matrix expression that we apply to the root local coord.
+        // If we have an expression cached from an early FP in the hierarchy chain, we can stop
+        // there instead of going all the way to the GP.
+        SkString transformExpression;
+        if (!tr.fMatrix.isEmpty()) {
+            // We have both a const/uniform sample matrix and a legacy coord transform
+            transformExpression.printf("%s", tr.fMatrix.c_str());
+        }
+
+        // If the sample matrix is kNone, then the current transform expression of just the
+        // coord transform matrix is sufficient.
+        if (tr.fFP->sampleMatrix().isConstUniform()) {
+            const auto* base = tr.fFP;
+            while(base) {
+                GrShaderVar cachedBaseCoord = localCoordsMap[base];
+                if (cachedBaseCoord.getType() != kVoid_GrSLType) {
+                    // Can stop here, as this varying already holds all transforms from higher FPs
+                    if (cachedBaseCoord.getType() == kFloat3_GrSLType) {
+                        localCoords = cachedBaseCoord.getName();
+                    } else {
+                        localCoords = SkStringPrintf("%s.xy1", cachedBaseCoord.getName().c_str());
+                    }
+                    break;
+                } else if (base->sampleMatrix().isConstUniform()) {
+                    // The FP knows the matrix expression it's sampled with, but its parent defined
+                    // the uniform (when the expression is not a constant).
+                    GrShaderVar uniform = uniformHandler->liftUniformToVertexShader(
+                            *base->parent(), base->sampleMatrix().fExpression);
+
+                    // Accumulate the base matrix expression as a preConcat
+                    SkString matrix;
+                    if (uniform.getType() != kVoid_GrSLType) {
+                        SkASSERT(uniform.getType() == kFloat3x3_GrSLType);
+                        matrix = uniform.getName();
+                    } else {
+                        // No uniform found, so presumably this is a constant
+                        matrix = base->sampleMatrix().fExpression;
+                    }
+
+                    if (!transformExpression.isEmpty()) {
+                        transformExpression.append(" * ");
+                    }
+                    transformExpression.appendf("(%s)", matrix.c_str());
                 } else {
-                    localCoords = tr.fLocalCoords.c_str();
+                    // This intermediate FP is just a pass through and doesn't need to be built
+                    // in to the expression, but must visit its parents in case they add transforms
+                    SkASSERT(base->sampleMatrix().isNoOp());
                 }
-                vb->codeAppend("{\n");
-                if (tr.fFP->sampleMatrix().fOwner) {
-                    uniformHandler->writeUniformMappings(tr.fFP->sampleMatrix().fOwner, vb);
-                }
-                if (tr.fType == kFloat2_GrSLType) {
-                    vb->codeAppendf("%s = (%s * %s * %s).xy", tr.fName,
-                                    tr.fFP->sampleMatrix().fExpression.c_str(), tr.fMatrix.c_str(),
-                                    localCoords.c_str());
-                } else {
-                    SkASSERT(tr.fType == kFloat3_GrSLType);
-                    vb->codeAppendf("%s = %s * %s * %s", tr.fName,
-                                    tr.fFP->sampleMatrix().fExpression.c_str(), tr.fMatrix.c_str(),
-                                    localCoords.c_str());
-                }
-                vb->codeAppend(";\n");
-                vb->codeAppend("}\n");
-                break;
+
+                base = base->parent();
             }
-            default:
-                break;
+        }
+
+        if (localCoords.isEmpty()) {
+            // Must use GP's local coords
+            if (tr.fLocalCoords.getType() == kFloat3_GrSLType) {
+                localCoords = tr.fLocalCoords.getName();
+            } else {
+                localCoords = SkStringPrintf("%s.xy1", tr.fLocalCoords.getName().c_str());
+            }
+        }
+
+        vb->codeAppend("{\n");
+        if (tr.fOutputCoords.getType() == kFloat2_GrSLType) {
+            vb->codeAppendf("%s = ((%s) * %s).xy", tr.fOutputCoords.getName().c_str(),
+                                                   transformExpression.c_str(),
+                                                   localCoords.c_str());
+        } else {
+            SkASSERT(tr.fOutputCoords.getType() == kFloat3_GrSLType);
+            vb->codeAppendf("%s = (%s) * %s", tr.fOutputCoords.getName().c_str(),
+                                              transformExpression.c_str(),
+                                              localCoords.c_str());
+        }
+        vb->codeAppend(";\n");
+        vb->codeAppend("}\n");
+
+        if (tr.fMatrix.isEmpty()) {
+            // Subtle work around: only cache the intermediate varying when there's no extra
+            // coord transform. If the FP uses a coord transform for a legacy effect, but also
+            // delegates to a child FP, we want the coordinates pre-GrCoordTransform to be sent
+            // to the child FP, but have the FP use the post-coordtransform legacy values
+            // (e.g. sampling a texture and relying on the GrCoordTransform for normalization
+            //  and mixing with a child FP that should not be normalized).
+            // FIXME: It's not really possible to apply this logic cleanly when transforms
+            // have been moved to the FS; in practice this doesn't seem to occur in our tests and
+            // the issue will go away once legacy coord transforms only have no-op matrices.
+            localCoordsMap.insert({ tr.fFP, tr.fOutputCoords });
         }
     }
 }
diff --git a/src/gpu/glsl/GrGLSLGeometryProcessor.h b/src/gpu/glsl/GrGLSLGeometryProcessor.h
index 10e2137..a0a35df 100644
--- a/src/gpu/glsl/GrGLSLGeometryProcessor.h
+++ b/src/gpu/glsl/GrGLSLGeometryProcessor.h
@@ -122,10 +122,15 @@
     SkTArray<TransformUniform, true> fInstalledTransforms;
 
     struct TransformInfo {
-        const char* fName;
-        GrSLType fType;
-        SkString fMatrix;
-        SkString fLocalCoords;
+        // The vertex-shader output variable to assign the transformed coordinates to
+        GrShaderVar                fOutputCoords;
+        // The name of a coord transform uniform to apply
+        SkString                   fMatrix;
+        // The coordinate to be transformed
+        GrShaderVar                fLocalCoords;
+        // The leaf FP of a transform hierarchy to be evaluated in the vertex shader;
+        // this FP will be const-uniform sampled, and all of its parents will have a sample matrix
+        // type of none or const-uniform.
         const GrFragmentProcessor* fFP;
     };
     SkTArray<TransformInfo> fTransformInfos;
diff --git a/src/gpu/glsl/GrGLSLProgramBuilder.cpp b/src/gpu/glsl/GrGLSLProgramBuilder.cpp
index 851bf4d..5965514 100644
--- a/src/gpu/glsl/GrGLSLProgramBuilder.cpp
+++ b/src/gpu/glsl/GrGLSLProgramBuilder.cpp
@@ -195,9 +195,34 @@
                                            fp,
                                            output.c_str(),
                                            input.c_str(),
+                                           "_coords",
                                            coords,
                                            textureSamplers);
 
+    if (fp.referencesSampleCoords()) {
+        // The fp's generated code expects a _coords variable, but we're a the root so _coords
+        // is just the local coordinates produced by the primitive processor.
+        SkASSERT(coords.count() > 0);
+
+        const GrShaderVar& varying = coordVars[0].fVaryingPoint;
+        switch(varying.getType()) {
+            case kFloat2_GrSLType:
+                fFS.codeAppendf("float2 %s = %s.xy;\n",
+                                args.fSampleCoord, varying.getName().c_str());
+                break;
+            case kFloat3_GrSLType:
+                fFS.codeAppendf("float2 %s = %s.xy / %s.z;\n",
+                                args.fSampleCoord,
+                                varying.getName().c_str(),
+                                varying.getName().c_str());
+                break;
+            default:
+                SkDEBUGFAILF("Unexpected type for varying: %d named %s\n",
+                             (int) varying.getType(), varying.getName().c_str());
+                break;
+        }
+    }
+
     fragProc->emitCode(args);
 
     // We have to check that effects and the code they emit are consistent, ie if an effect
diff --git a/src/gpu/glsl/GrGLSLUniformHandler.cpp b/src/gpu/glsl/GrGLSLUniformHandler.cpp
index ace0311..0d7a299 100644
--- a/src/gpu/glsl/GrGLSLUniformHandler.cpp
+++ b/src/gpu/glsl/GrGLSLUniformHandler.cpp
@@ -10,14 +10,28 @@
 #include "src/gpu/glsl/GrGLSL.h"
 #include "src/gpu/glsl/GrGLSLShaderBuilder.h"
 
-void GrGLSLUniformHandler::writeUniformMappings(GrFragmentProcessor* owner,
-                                                GrGLSLShaderBuilder* b) {
+GrShaderVar GrGLSLUniformHandler::getUniformMapping(const GrFragmentProcessor& owner,
+                                                    SkString rawName) const {
     for (int i = this->numUniforms() - 1; i >= 0; i--) {
-        UniformInfo& u = this->uniform(i);
-        if (u.fOwner == owner) {
-            u.fVisibility |= kVertex_GrShaderFlag;
-            b->codeAppendf("%s %s = %s;\n", GrGLSLTypeString(u.fVariable.getType()),
-                           u.fRawName.c_str(), u.fVariable.getName().c_str());
+        const UniformInfo& u = this->uniform(i);
+        if (u.fOwner == &owner && u.fRawName == rawName) {
+            return u.fVariable;
         }
     }
+    return GrShaderVar();
+}
+
+GrShaderVar GrGLSLUniformHandler::liftUniformToVertexShader(const GrFragmentProcessor& owner,
+                                                            SkString rawName) {
+    for (int i = this->numUniforms() - 1; i >= 0; i--) {
+        UniformInfo& u = this->uniform(i);
+        if (u.fOwner == &owner && u.fRawName == rawName) {
+            u.fVisibility |= kVertex_GrShaderFlag;
+            return u.fVariable;
+        }
+    }
+    // Uniform not found; it's better to return a void variable than to assert because sample
+    // matrices that are const/uniform are treated the same for most of the code. When the sample
+    // matrix expression can't be found as a uniform, we can infer it's a constant.
+    return GrShaderVar();
 }
diff --git a/src/gpu/glsl/GrGLSLUniformHandler.h b/src/gpu/glsl/GrGLSLUniformHandler.h
index 15fbec9..1cd8d56 100644
--- a/src/gpu/glsl/GrGLSLUniformHandler.h
+++ b/src/gpu/glsl/GrGLSLUniformHandler.h
@@ -82,8 +82,15 @@
     virtual int numUniforms() const = 0;
 
     virtual UniformInfo& uniform(int idx) = 0;
+    virtual const UniformInfo& uniform(int idx) const = 0;
 
-    void writeUniformMappings(GrFragmentProcessor* owner, GrGLSLShaderBuilder* b);
+    // Looks up a uniform that was added by 'owner' with the given 'rawName' (pre-mangling).
+    // If there is no such uniform, a variable with type kVoid is returned.
+    GrShaderVar getUniformMapping(const GrFragmentProcessor& owner, SkString rawName) const;
+
+    // Like getUniformMapping(), but if the uniform is found it also marks it as accessible in
+    // the vertex shader.
+    GrShaderVar liftUniformToVertexShader(const GrFragmentProcessor& owner, SkString rawName);
 
 protected:
     struct UniformMapping {
diff --git a/src/gpu/gradients/generated/GrLinearGradientLayout.cpp b/src/gpu/gradients/generated/GrLinearGradientLayout.cpp
index 5bc1732..9d3ea4e 100644
--- a/src/gpu/gradients/generated/GrLinearGradientLayout.cpp
+++ b/src/gpu/gradients/generated/GrLinearGradientLayout.cpp
@@ -25,13 +25,11 @@
         (void)_outer;
         auto gradientMatrix = _outer.gradientMatrix;
         (void)gradientMatrix;
-        SkString sk_TransformedCoords2D_0 = fragBuilder->ensureCoords2D(
-                args.fTransformedCoords[0].fVaryingPoint, _outer.sampleMatrix());
         fragBuilder->codeAppendf(
                 R"SkSL(half t = half(%s.x) + 9.9999997473787516e-06;
 %s = half4(t, 1.0, 0.0, 0.0);
 )SkSL",
-                sk_TransformedCoords2D_0.c_str(), args.fOutputColor);
+                args.fSampleCoord, args.fOutputColor);
     }
 
 private:
@@ -54,6 +52,7 @@
         , fCoordTransform0(src.fCoordTransform0)
         , gradientMatrix(src.gradientMatrix) {
     this->addCoordTransform(&fCoordTransform0);
+    this->setUsesSampleCoordsDirectly();
 }
 std::unique_ptr<GrFragmentProcessor> GrLinearGradientLayout::clone() const {
     return std::unique_ptr<GrFragmentProcessor>(new GrLinearGradientLayout(*this));
diff --git a/src/gpu/gradients/generated/GrLinearGradientLayout.h b/src/gpu/gradients/generated/GrLinearGradientLayout.h
index 5c3323d..58bfe02 100644
--- a/src/gpu/gradients/generated/GrLinearGradientLayout.h
+++ b/src/gpu/gradients/generated/GrLinearGradientLayout.h
@@ -36,6 +36,7 @@
                         (OptimizationFlags)kPreservesOpaqueInput_OptimizationFlag)
             , fCoordTransform0(gradientMatrix)
             , gradientMatrix(gradientMatrix) {
+        this->setUsesSampleCoordsDirectly();
         this->addCoordTransform(&fCoordTransform0);
     }
     GrGLSLFragmentProcessor* onCreateGLSLInstance() const override;
diff --git a/src/gpu/gradients/generated/GrRadialGradientLayout.cpp b/src/gpu/gradients/generated/GrRadialGradientLayout.cpp
index c17f77a..00b15c5 100644
--- a/src/gpu/gradients/generated/GrRadialGradientLayout.cpp
+++ b/src/gpu/gradients/generated/GrRadialGradientLayout.cpp
@@ -25,13 +25,11 @@
         (void)_outer;
         auto gradientMatrix = _outer.gradientMatrix;
         (void)gradientMatrix;
-        SkString sk_TransformedCoords2D_0 = fragBuilder->ensureCoords2D(
-                args.fTransformedCoords[0].fVaryingPoint, _outer.sampleMatrix());
         fragBuilder->codeAppendf(
                 R"SkSL(half t = half(length(%s));
 %s = half4(t, 1.0, 0.0, 0.0);
 )SkSL",
-                sk_TransformedCoords2D_0.c_str(), args.fOutputColor);
+                args.fSampleCoord, args.fOutputColor);
     }
 
 private:
@@ -54,6 +52,7 @@
         , fCoordTransform0(src.fCoordTransform0)
         , gradientMatrix(src.gradientMatrix) {
     this->addCoordTransform(&fCoordTransform0);
+    this->setUsesSampleCoordsDirectly();
 }
 std::unique_ptr<GrFragmentProcessor> GrRadialGradientLayout::clone() const {
     return std::unique_ptr<GrFragmentProcessor>(new GrRadialGradientLayout(*this));
diff --git a/src/gpu/gradients/generated/GrRadialGradientLayout.h b/src/gpu/gradients/generated/GrRadialGradientLayout.h
index 897a97f..1e8b6b3 100644
--- a/src/gpu/gradients/generated/GrRadialGradientLayout.h
+++ b/src/gpu/gradients/generated/GrRadialGradientLayout.h
@@ -36,6 +36,7 @@
                         (OptimizationFlags)kPreservesOpaqueInput_OptimizationFlag)
             , fCoordTransform0(gradientMatrix)
             , gradientMatrix(gradientMatrix) {
+        this->setUsesSampleCoordsDirectly();
         this->addCoordTransform(&fCoordTransform0);
     }
     GrGLSLFragmentProcessor* onCreateGLSLInstance() const override;
diff --git a/src/gpu/gradients/generated/GrSweepGradientLayout.cpp b/src/gpu/gradients/generated/GrSweepGradientLayout.cpp
index 0776671..85a4cc7 100644
--- a/src/gpu/gradients/generated/GrSweepGradientLayout.cpp
+++ b/src/gpu/gradients/generated/GrSweepGradientLayout.cpp
@@ -33,8 +33,6 @@
                                                    "bias");
         scaleVar = args.fUniformHandler->addUniform(&_outer, kFragment_GrShaderFlag, kHalf_GrSLType,
                                                     "scale");
-        SkString sk_TransformedCoords2D_0 = fragBuilder->ensureCoords2D(
-                args.fTransformedCoords[0].fVaryingPoint, _outer.sampleMatrix());
         fragBuilder->codeAppendf(
                 R"SkSL(half angle;
 if (sk_Caps.atan2ImplementedAsAtanYOverX) {
@@ -45,9 +43,8 @@
 half t = ((angle * 0.15915493667125702 + 0.5) + %s) * %s;
 %s = half4(t, 1.0, 0.0, 0.0);
 )SkSL",
-                sk_TransformedCoords2D_0.c_str(), sk_TransformedCoords2D_0.c_str(),
-                sk_TransformedCoords2D_0.c_str(), sk_TransformedCoords2D_0.c_str(),
-                sk_TransformedCoords2D_0.c_str(), args.fUniformHandler->getUniformCStr(biasVar),
+                args.fSampleCoord, args.fSampleCoord, args.fSampleCoord, args.fSampleCoord,
+                args.fSampleCoord, args.fUniformHandler->getUniformCStr(biasVar),
                 args.fUniformHandler->getUniformCStr(scaleVar), args.fOutputColor);
     }
 
@@ -93,6 +90,7 @@
         , bias(src.bias)
         , scale(src.scale) {
     this->addCoordTransform(&fCoordTransform0);
+    this->setUsesSampleCoordsDirectly();
 }
 std::unique_ptr<GrFragmentProcessor> GrSweepGradientLayout::clone() const {
     return std::unique_ptr<GrFragmentProcessor>(new GrSweepGradientLayout(*this));
diff --git a/src/gpu/gradients/generated/GrSweepGradientLayout.h b/src/gpu/gradients/generated/GrSweepGradientLayout.h
index b547ec1..ad323f6 100644
--- a/src/gpu/gradients/generated/GrSweepGradientLayout.h
+++ b/src/gpu/gradients/generated/GrSweepGradientLayout.h
@@ -40,6 +40,7 @@
             , gradientMatrix(gradientMatrix)
             , bias(bias)
             , scale(scale) {
+        this->setUsesSampleCoordsDirectly();
         this->addCoordTransform(&fCoordTransform0);
     }
     GrGLSLFragmentProcessor* onCreateGLSLInstance() const override;
diff --git a/src/gpu/gradients/generated/GrTextureGradientColorizer.cpp b/src/gpu/gradients/generated/GrTextureGradientColorizer.cpp
index c2f0533..cef854e 100644
--- a/src/gpu/gradients/generated/GrTextureGradientColorizer.cpp
+++ b/src/gpu/gradients/generated/GrTextureGradientColorizer.cpp
@@ -25,8 +25,8 @@
         (void)_outer;
         fragBuilder->codeAppendf(
                 R"SkSL(half2 coord = half2(%s.x, 0.5);)SkSL", args.fInputColor);
-        SkString _sample327;
         SkString _coords327("float2(coord)");
+        SkString _sample327;
         _sample327 = this->invokeChild(_outer.textureFP_index, args, _coords327.c_str());
         fragBuilder->codeAppendf(
                 R"SkSL(
diff --git a/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.cpp b/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.cpp
index fa172c7..adde160 100644
--- a/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.cpp
+++ b/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.cpp
@@ -42,8 +42,6 @@
         (void)focalParams;
         focalParamsVar = args.fUniformHandler->addUniform(&_outer, kFragment_GrShaderFlag,
                                                           kHalf2_GrSLType, "focalParams");
-        SkString sk_TransformedCoords2D_0 = fragBuilder->ensureCoords2D(
-                args.fTransformedCoords[0].fVaryingPoint, _outer.sampleMatrix());
         fragBuilder->codeAppendf(
                 R"SkSL(float2 p = %s;
 float t = -1.0;
@@ -115,7 +113,7 @@
 }
 %s = half4(half(t), v, 0.0, 0.0);
 )SkSL",
-                sk_TransformedCoords2D_0.c_str(), (int)_outer.type,
+                args.fSampleCoord, (int)_outer.type,
                 args.fUniformHandler->getUniformCStr(focalParamsVar),
                 args.fUniformHandler->getUniformCStr(focalParamsVar),
                 (_outer.isRadiusIncreasing ? "true" : "false"),
@@ -185,6 +183,7 @@
         , isNativelyFocal(src.isNativelyFocal)
         , focalParams(src.focalParams) {
     this->addCoordTransform(&fCoordTransform0);
+    this->setUsesSampleCoordsDirectly();
 }
 std::unique_ptr<GrFragmentProcessor> GrTwoPointConicalGradientLayout::clone() const {
     return std::unique_ptr<GrFragmentProcessor>(new GrTwoPointConicalGradientLayout(*this));
diff --git a/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.h b/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.h
index afd93db..77a8e69 100644
--- a/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.h
+++ b/src/gpu/gradients/generated/GrTwoPointConicalGradientLayout.h
@@ -59,6 +59,7 @@
             , isSwapped(isSwapped)
             , isNativelyFocal(isNativelyFocal)
             , focalParams(focalParams) {
+        this->setUsesSampleCoordsDirectly();
         this->addCoordTransform(&fCoordTransform0);
     }
     GrGLSLFragmentProcessor* onCreateGLSLInstance() const override;
diff --git a/src/gpu/mtl/GrMtlUniformHandler.h b/src/gpu/mtl/GrMtlUniformHandler.h
index 719fa1e..eb8f751 100644
--- a/src/gpu/mtl/GrMtlUniformHandler.h
+++ b/src/gpu/mtl/GrMtlUniformHandler.h
@@ -48,6 +48,9 @@
     UniformInfo& uniform(int idx) override {
         return fUniforms.item(idx);
     }
+    const UniformInfo& uniform(int idx) const override {
+        return fUniforms.item(idx);
+    }
 
 private:
     explicit GrMtlUniformHandler(GrGLSLProgramBuilder* program)
diff --git a/src/gpu/vk/GrVkUniformHandler.h b/src/gpu/vk/GrVkUniformHandler.h
index 135d6ad..19572d8 100644
--- a/src/gpu/vk/GrVkUniformHandler.h
+++ b/src/gpu/vk/GrVkUniformHandler.h
@@ -63,6 +63,9 @@
     UniformInfo& uniform(int idx) override {
         return fUniforms.item(idx);
     }
+    const UniformInfo& uniform(int idx) const override {
+        return fUniforms.item(idx);
+    }
 
 private:
     explicit GrVkUniformHandler(GrGLSLProgramBuilder* program)
diff --git a/src/shaders/SkPerlinNoiseShader.cpp b/src/shaders/SkPerlinNoiseShader.cpp
index a17d750..eb65e03 100644
--- a/src/shaders/SkPerlinNoiseShader.cpp
+++ b/src/shaders/SkPerlinNoiseShader.cpp
@@ -846,8 +846,6 @@
 
     GrGLSLFragmentBuilder* fragBuilder = args.fFragBuilder;
     GrGLSLUniformHandler* uniformHandler = args.fUniformHandler;
-    SkString vCoords = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                   pne.sampleMatrix());
 
     fBaseFrequencyUni = uniformHandler->addUniform(&pne, kFragment_GrShaderFlag, kHalf2_GrSLType,
                                                    "baseFrequency");
@@ -960,7 +958,7 @@
 
     // There are rounding errors if the floor operation is not performed here
     fragBuilder->codeAppendf("half2 noiseVec = half2(floor(%s.xy) * %s);",
-                             vCoords.c_str(), baseFrequencyUni);
+                             args.fSampleCoord, baseFrequencyUni);
 
     // Clear the color accumulator
     fragBuilder->codeAppendf("%s = half4(0.0);", args.fOutputColor);
@@ -1206,8 +1204,6 @@
     const GrImprovedPerlinNoiseEffect& pne = args.fFp.cast<GrImprovedPerlinNoiseEffect>();
     GrGLSLFragmentBuilder* fragBuilder = args.fFragBuilder;
     GrGLSLUniformHandler* uniformHandler = args.fUniformHandler;
-    SkString vCoords = fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint,
-                                                   pne.sampleMatrix());
 
     fBaseFrequencyUni = uniformHandler->addUniform(&pne, kFragment_GrShaderFlag, kHalf2_GrSLType,
                                                    "baseFrequency");
@@ -1312,7 +1308,7 @@
     fragBuilder->emitFunction(kHalf_GrSLType, "noiseOctaves", SK_ARRAY_COUNT(noiseOctavesArgs),
                               noiseOctavesArgs, noiseOctavesCode.c_str(), &noiseOctavesFuncName);
 
-    fragBuilder->codeAppendf("half2 coords = half2(%s * %s);", vCoords.c_str(), baseFrequencyUni);
+    fragBuilder->codeAppendf("half2 coords = half2(%s * %s);", args.fSampleCoord, baseFrequencyUni);
     fragBuilder->codeAppendf("half r = %s(half3(coords, %s));", noiseOctavesFuncName.c_str(),
                              zUni);
     fragBuilder->codeAppendf("half g = %s(half3(coords, %s + 0000.0));",
diff --git a/src/sksl/SkSLCPPCodeGenerator.cpp b/src/sksl/SkSLCPPCodeGenerator.cpp
index 66b020c..90a2046 100644
--- a/src/sksl/SkSLCPPCodeGenerator.cpp
+++ b/src/sksl/SkSLCPPCodeGenerator.cpp
@@ -126,14 +126,8 @@
                 fErrors.error(i.fIndex->fOffset, "Only sk_TransformedCoords2D[0] is allowed");
                 return;
             }
-            String name = "sk_TransformedCoords2D_" + to_string(index);
-            fFormatArgs.push_back(name + ".c_str()");
-            if (!fAccessLocalCoordsDirectly) {
-                fAccessLocalCoordsDirectly = true;
-                addExtraEmitCodeLine("SkString " + name +
-                                     " = fragBuilder->ensureCoords2D(args.fTransformedCoords[" +
-                                     to_string(index) + "].fVaryingPoint, _outer.sampleMatrix());");
-            }
+            fAccessSampleCoordsDirectly = true;
+            fFormatArgs.push_back("args.fSampleCoord");
             return;
         } else if (SK_TEXTURESAMPLERS_BUILTIN == builtin) {
             this->write("%s");
@@ -431,57 +425,52 @@
         // sksl variables defined in earlier sksl code.
         this->newExtraEmitCodeBlock();
 
-        // Set to the empty string when no input color parameter should be emitted, which means this
-        // must be properly formatted with a prefixed comma when the parameter should be inserted
-        // into the invokeChild() parameter list.
-        String inputArg;
-        String inputColorName;
+        String inputColorName; // the sksl variable/expression, referenced later for null child FPs
+        String inputColor;
         if (c.fArguments.size() > 1 && c.fArguments[1]->fType.name() == "half4") {
             // Use the invokeChild() variant that accepts an input color, so convert the 2nd
             // argument's expression into C++ code that produces sksl stored in an SkString.
             inputColorName = "_input" + to_string(c.fOffset);
             addExtraEmitCodeLine(convertSKSLExpressionToCPP(*c.fArguments[1], inputColorName));
 
-            // invokeChild() needs a char*
-            inputArg = ", " + inputColorName + ".c_str()";
+            // invokeChild() needs a char* and a pre-pended comma
+            inputColor = ", " + inputColorName + ".c_str()";
         }
 
-        bool hasCoords = c.fArguments.back()->fType.name() == "float2";
-        SampleMatrix matrix = SampleMatrix::Make(fProgram, child);
+        String inputCoord;
+        String invokeFunction = "invokeChild";
+        if (c.fArguments.back()->fType.name() == "float2") {
+            // Invoking child with explicit coordinates at this call site
+            inputCoord = "_coords" + to_string(c.fOffset);
+            addExtraEmitCodeLine(convertSKSLExpressionToCPP(*c.fArguments.back(), inputCoord));
+            inputCoord.append(".c_str()");
+        } else if (c.fArguments.back()->fType.name() == "float3x3") {
+            // Invoking child with a matrix, sampling relative to the input coords.
+            invokeFunction = "invokeChildWithMatrix";
+            SampleMatrix matrix = SampleMatrix::Make(fProgram, child);
+
+            if (!matrix.isConstUniform()) {
+                inputCoord = "_matrix" + to_string(c.fOffset);
+                addExtraEmitCodeLine(convertSKSLExpressionToCPP(*c.fArguments.back(), inputCoord));
+                inputCoord.append(".c_str()");
+            }
+            // else pass in the empty string to rely on invokeChildWithMatrix's automatic uniform
+            // resolution
+        }
+        if (!inputCoord.empty()) {
+            inputCoord = ", " + inputCoord;
+        }
+
         // Write the output handling after the possible input handling
         String childName = "_sample" + to_string(c.fOffset);
         addExtraEmitCodeLine("SkString " + childName + ";");
-        String coordsName;
-        String matrixName;
-        if (hasCoords) {
-            coordsName = "_coords" + to_string(c.fOffset);
-            addExtraEmitCodeLine(convertSKSLExpressionToCPP(*c.fArguments.back(), coordsName));
-        }
-        if (matrix.fKind == SampleMatrix::Kind::kVariable) {
-            matrixName = "_matrix" + to_string(c.fOffset);
-            addExtraEmitCodeLine(convertSKSLExpressionToCPP(*c.fArguments.back(), matrixName));
-        }
+
         if (c.fArguments[0]->fType.kind() == Type::kNullable_Kind) {
             addExtraEmitCodeLine("if (_outer." + String(child.fName) + "_index >= 0) {\n    ");
         }
-        if (hasCoords) {
-            addExtraEmitCodeLine(childName + " = this->invokeChild(_outer." + String(child.fName) +
-                                 "_index" + inputArg + ", args, " + coordsName + ".c_str());");
-        } else {
-            switch (matrix.fKind) {
-                case SampleMatrix::Kind::kMixed:
-                case SampleMatrix::Kind::kVariable:
-                    addExtraEmitCodeLine(childName + " = this->invokeChildWithMatrix(_outer." +
-                                         String(child.fName) + "_index" + inputArg + ", args, " +
-                                         matrixName + ".c_str());");
-                    break;
-                case SampleMatrix::Kind::kConstantOrUniform:
-                case SampleMatrix::Kind::kNone:
-                    addExtraEmitCodeLine(childName + " = this->invokeChild(_outer." +
-                                         String(child.fName) + "_index" + inputArg + ", args);");
-                    break;
-            }
-        }
+        addExtraEmitCodeLine(childName + " = this->" + invokeFunction + "(_outer." +
+                             String(child.fName) + "_index" + inputColor + ", args" +
+                             inputCoord + ");");
 
         if (c.fArguments[0]->fType.kind() == Type::kNullable_Kind) {
             // Null FPs are not emitted, but their output can still be referenced in dependent
@@ -1160,6 +1149,9 @@
             String fieldName = HCodeGenerator::CoordTransformName(s.fArgument, i);
             this->writef("    this->addCoordTransform(&%s);\n", fieldName.c_str());
         }
+        if (fAccessSampleCoordsDirectly) {
+            this->writef("    this->setUsesSampleCoordsDirectly();\n");
+        }
         this->write("}\n");
         this->writef("std::unique_ptr<GrFragmentProcessor> %s::clone() const {\n",
                      fFullName.c_str());
diff --git a/src/sksl/SkSLCPPCodeGenerator.h b/src/sksl/SkSLCPPCodeGenerator.h
index 357e7c1..ecf7f63 100644
--- a/src/sksl/SkSLCPPCodeGenerator.h
+++ b/src/sksl/SkSLCPPCodeGenerator.h
@@ -124,7 +124,7 @@
 
     std::vector<String> fFormatArgs;
     // true if the sksl referenced sk_TransformedCoords[0]
-    bool fAccessLocalCoordsDirectly = false;
+    bool fAccessSampleCoordsDirectly = false;
 
     // if true, we are writing a C++ expression instead of a GLSL expression
     bool fCPPMode = false;
diff --git a/src/sksl/SkSLHCodeGenerator.cpp b/src/sksl/SkSLHCodeGenerator.cpp
index 5d4195b..7fe9cf6 100644
--- a/src/sksl/SkSLHCodeGenerator.cpp
+++ b/src/sksl/SkSLHCodeGenerator.cpp
@@ -272,6 +272,12 @@
     }
     this->writef(" {\n");
     this->writeSection(CONSTRUCTOR_CODE_SECTION);
+
+    int usesSampleCoordsDirectly = fProgram.fSource->find("sk_TransformedCoords", 0);
+    if (usesSampleCoordsDirectly >= 0) {
+        this->writef("        this->setUsesSampleCoordsDirectly();\n");
+    }
+
     int samplerCount = 0;
     for (const Variable* param : fSectionAndParameterHelper.getParameters()) {
         if (param->fType.kind() == Type::kSampler_Kind) {
@@ -299,16 +305,26 @@
                 }
                 switch(matrix.fKind) {
                     case SampleMatrix::Kind::kVariable:
-                        matrixArg.appendf(", SkSL::SampleMatrix::MakeVariable()");
+                        // FIXME As it stands, matrix.fHasPerspective will always be true. Ideally
+                        // we could build an expression from all const/uniform sample matrices used
+                        // in the sksl, e.g. m1.hasPerspective() || m2.hasPerspective(), where each
+                        // term was the type expression for the original const/uniform sample
+                        // matrices before they were merged during sksl analysis.
+                        matrixArg.appendf(", SkSL::SampleMatrix::MakeVariable(%s)",
+                                          matrix.fHasPerspective ? "true" : "false");
                         break;
-                    case SampleMatrix::Kind::kConstantOrUniform:
-                        matrixArg.appendf(", SkSL::SampleMatrix::MakeConstUniform(\"%s\")",
-                                          matrix.fExpression.c_str());
-                        break;
-                    case SampleMatrix::Kind::kMixed:
-                        // Mixed is only produced when combining FPs, not from analysis of sksl
-                        SkASSERT(false);
-                        break;
+                    case SampleMatrix::Kind::kConstantOrUniform: {
+                        String perspExpression = matrix.fHasPerspective ? "true" : "false";
+                        for (const Variable* p : fSectionAndParameterHelper.getParameters()) {
+                            if ((p->fModifiers.fFlags & Modifiers::kIn_Flag) &&
+                                matrix.fExpression == String(p->fName)) {
+                                perspExpression = matrix.fExpression + ".hasPerspective()";
+                                break;
+                            }
+                        }
+                        matrixArg.appendf(", SkSL::SampleMatrix::MakeConstUniform(\"%s\", %s)",
+                                          matrix.fExpression.c_str(), perspExpression.c_str());
+                        break; }
                     case SampleMatrix::Kind::kNone:
                         break;
                 }
diff --git a/src/sksl/SkSLSampleMatrix.cpp b/src/sksl/SkSLSampleMatrix.cpp
index 9b07f59..c12159f 100644
--- a/src/sksl/SkSLSampleMatrix.cpp
+++ b/src/sksl/SkSLSampleMatrix.cpp
@@ -32,7 +32,7 @@
 
 SampleMatrix SampleMatrix::merge(const SampleMatrix& other) {
     if (fKind == Kind::kVariable || other.fKind == Kind::kVariable) {
-        *this = SampleMatrix::MakeVariable();
+        *this = SampleMatrix::MakeVariable(this->fHasPerspective || other.fHasPerspective);
         return *this;
     }
     if (other.fKind == Kind::kConstantOrUniform) {
@@ -40,7 +40,7 @@
             if (fExpression == other.fExpression) {
                 return *this;
             }
-            *this = SampleMatrix::MakeVariable();
+            *this = SampleMatrix::MakeVariable(this->fHasPerspective || other.fHasPerspective);
             return *this;
         }
         SkASSERT(fKind == Kind::kNone);
@@ -94,7 +94,19 @@
                 fc.fArguments[0]->fKind == Expression::kVariableReference_Kind &&
                 &((VariableReference&) *fc.fArguments[0]).fVariable == &fFP) {
                 if (fc.fArguments.back()->isConstantOrUniform()) {
-                    return SampleMatrix::MakeConstUniform(fc.fArguments.back()->description());
+                    if (fc.fArguments.back()->fKind == Expression::Kind::kVariableReference_Kind ||
+                        fc.fArguments.back()->fKind == Expression::Kind::kConstructor_Kind) {
+                        // FIXME if this is a constant, we should parse the float3x3 constructor and
+                        // determine if the resulting matrix introduces perspective.
+                        return SampleMatrix::MakeConstUniform(fc.fArguments.back()->description());
+                    } else {
+                        // FIXME this is really to workaround a restriction of the downstream code
+                        // that relies on the SampleMatrix's fExpression to identify uniform names.
+                        // Once they are tracked separately, any constant/uniform expression can
+                        // work, but right now this avoids issues from '0.5 * matrix' that is both
+                        // a constant AND a uniform.
+                        return SampleMatrix::MakeVariable();
+                    }
                 } else {
                     return SampleMatrix::MakeVariable();
                 }
diff --git a/src/sksl/SkSLSampleMatrix.h b/src/sksl/SkSLSampleMatrix.h
index 41c0cb6..076b251 100644
--- a/src/sksl/SkSLSampleMatrix.h
+++ b/src/sksl/SkSLSampleMatrix.h
@@ -32,24 +32,25 @@
         kConstantOrUniform,
         // The FP is sampled with a non-constant/uniform value, or sampled multiple times, and
         // thus the transform cannot be hoisted to the vertex shader.
-        kVariable,
-        // The FP is sampled with a constant or uniform value, *and* also inherits a variable
-        // transform from an ancestor. The transform cannot be hoisted to the vertex shader, and
-        // both matrices need to be applied.
-        kMixed,
+        kVariable
     };
 
     // Make a SampleMatrix with kNone for its kind. Will not have an expression or have perspective.
+    // Represents sample(child, color) and sample(child, color, float2) calls.
     SampleMatrix()
-            : fOwner(nullptr)
-            , fKind(Kind::kNone) {}
+            : fKind(Kind::kNone)
+            , fHasPerspective(false) {}
 
-    static SampleMatrix MakeConstUniform(String expression) {
-        return SampleMatrix(Kind::kConstantOrUniform, expression);
+    // This corresponds to sample(child, color, matrix) calls where every call site in the FP has
+    // the same constant or uniform.
+    static SampleMatrix MakeConstUniform(String expression, bool hasPerspective=true) {
+        return SampleMatrix(Kind::kConstantOrUniform, expression, hasPerspective);
     }
 
-    static SampleMatrix MakeVariable() {
-        return SampleMatrix(Kind::kVariable, "");
+    // This corresponds to sample(child, color, matrix) where the 3rd argument is an expression,
+    // or where the constants/uniforms are not the same at all call sites in the FP.
+    static SampleMatrix MakeVariable(bool hasPerspective=true) {
+        return SampleMatrix(Kind::kVariable, "", hasPerspective);
     }
 
     static SampleMatrix Make(const Program& program, const Variable& fp);
@@ -57,9 +58,14 @@
     SampleMatrix merge(const SampleMatrix& other);
 
     bool operator==(const SampleMatrix& other) const {
-        return fKind == other.fKind && fExpression == other.fExpression && fOwner == other.fOwner;
+        return fKind == other.fKind && fExpression == other.fExpression &&
+               fHasPerspective == other.fHasPerspective;
     }
 
+    bool isNoOp() const { return fKind == Kind::kNone; }
+    bool isConstUniform() const { return fKind == Kind::kConstantOrUniform; }
+    bool isVariable() const { return fKind == Kind::kVariable; }
+
 #ifdef SK_DEBUG
     String description() {
         switch (fKind) {
@@ -69,26 +75,24 @@
                 return "SampleMatrix<ConstantOrUniform(" + fExpression + ")>";
             case Kind::kVariable:
                 return "SampleMatrix<Variable>";
-            case Kind::kMixed:
-                return "SampleMatrix<Mixed(" + fExpression + ")>";
         }
     }
 #endif
 
-    // TODO(michaelludwig): fOwner and fBase are going away; owner is filled in automatically when
-    // a matrix-sampled FP is registered as a child.
-    GrFragmentProcessor* fOwner;
     Kind fKind;
     // The constant or uniform expression representing the matrix (will be the empty string when
     // kind == kNone or kVariable)
     String fExpression;
-    const GrFragmentProcessor* fBase = nullptr;
+
+    // FIXME: We can expand this to track a more general matrix type to allow for optimizations on
+    // identity or scale+translate matrices too.
+    bool fHasPerspective;
 
 private:
-    SampleMatrix(Kind kind, String expression)
-            : fOwner(nullptr)
-            , fKind(kind)
-            , fExpression(expression) {}
+    SampleMatrix(Kind kind, String expression, bool hasPerspective)
+            : fKind(kind)
+            , fExpression(expression)
+            , fHasPerspective(hasPerspective) {}
 };
 
 } // namespace
diff --git a/tests/ProcessorTest.cpp b/tests/ProcessorTest.cpp
index dda51e3..3ed6a83 100644
--- a/tests/ProcessorTest.cpp
+++ b/tests/ProcessorTest.cpp
@@ -787,7 +787,11 @@
                                       "%s\n", describe_fp(*fp).c_str());
             REPORTER_ASSERT(reporter, fp->numChildProcessors() == clone->numChildProcessors(),
                                       "%s\n", describe_fp(*fp).c_str());
-            REPORTER_ASSERT(reporter, fp->usesLocalCoords() == clone->usesLocalCoords(),
+            REPORTER_ASSERT(reporter, fp->sampleCoordsDependOnLocalCoords() ==
+                                      clone->sampleCoordsDependOnLocalCoords(),
+                                      "%s\n", describe_fp(*fp).c_str());
+            REPORTER_ASSERT(reporter, fp->referencesSampleCoords() ==
+                                      clone->referencesSampleCoords(),
                                       "%s\n", describe_fp(*fp).c_str());
             // Draw with original and read back the results.
             render_fp(context, rtc.get(), std::move(fp), inputTexture, kPremul_SkAlphaType,
diff --git a/tests/SkSLFPTest.cpp b/tests/SkSLFPTest.cpp
index ef80854..bcace4b 100644
--- a/tests/SkSLFPTest.cpp
+++ b/tests/SkSLFPTest.cpp
@@ -507,15 +507,14 @@
                  sk_OutColor = half4(sk_TransformedCoords2D[0], sk_TransformedCoords2D[0]);
              }
          )__SkSL__",
-         /*expectedH=*/{},
+         /*expectedH=*/{
+             "this->setUsesSampleCoordsDirectly();"
+         },
          /*expectedCPP=*/{
-            "SkString sk_TransformedCoords2D_0 = "
-                           "fragBuilder->ensureCoords2D(args.fTransformedCoords[0].fVaryingPoint, "
-                                                       "_outer.sampleMatrix());",
             "fragBuilder->codeAppendf(\n"
             "R\"SkSL(%s = half4(%s, %s);\n"
             ")SkSL\"\n"
-            ", args.fOutputColor, sk_TransformedCoords2D_0.c_str(), sk_TransformedCoords2D_0.c_str());"
+            ", args.fOutputColor, args.fSampleCoord, args.fSampleCoord);"
          });
 }
 
@@ -815,17 +814,14 @@
              }
          )__SkSL__",
          /*expectedH=*/{
-             "child_index = this->registerExplicitlySampledChild(std::move(child));"
+             "child_index = this->registerExplicitlySampledChild(std::move(child));",
+             "this->setUsesSampleCoordsDirectly();"
          },
          /*expectedCPP=*/{
             "SkString _sample150;\n",
             "_sample150 = this->invokeChild(_outer.child_index, args);\n",
             "SkString _sample166;\n",
-            "SkString sk_TransformedCoords2D_0 = fragBuilder->ensureCoords2D("
-                                                     "args.fTransformedCoords[0].fVaryingPoint, "
-                                                     "_outer.sampleMatrix());\n",
-            "SkString _coords166 = SkStringPrintf(\"%s / 2.0\", "
-                "sk_TransformedCoords2D_0.c_str());\n",
+            "SkString _coords166 = SkStringPrintf(\"%s / 2.0\", args.fSampleCoord);\n",
             "_sample166 = this->invokeChild(_outer.child_index, args, _coords166.c_str());\n",
             "fragBuilder->codeAppendf(\n"
             "R\"SkSL(%s = %s + %s;\n"
@@ -864,7 +860,7 @@
          });
 }
 
-DEF_TEST(SkSLFPMatrixSample, r) {
+DEF_TEST(SkSLFPMatrixSampleConstant, r) {
     test(r,
          *SkSL::ShaderCapsFactory::Default(),
          R"__SkSL__(
@@ -873,6 +869,145 @@
                  sk_OutColor = sample(child, float3x3(2));
              }
          )__SkSL__",
-         /*expectedH=*/{},
-         /*expectedCPP=*/{});
+         /*expectedH=*/{
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeConstUniform(\"float3x3(2.0)\", true));"
+         },
+         /*expectedCPP=*/{
+             "this->invokeChildWithMatrix(_outer.child_index, args)"
+         });
+}
+
+DEF_TEST(SkSLFPMatrixSampleUniform, r) {
+    test(r,
+         *SkSL::ShaderCapsFactory::Default(),
+         R"__SkSL__(
+             in fragmentProcessor? child;
+             uniform float3x3 matrix;
+             void main() {
+                 sk_OutColor = sample(child, matrix);
+             }
+         )__SkSL__",
+         /*expectedH=*/{
+             // Since 'matrix' is just a uniform, the generated code can't determine perspective.
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeConstUniform(\"matrix\", true));"
+         },
+         /*expectedCPP=*/{
+             "this->invokeChildWithMatrix(_outer.child_index, args)"
+         });
+}
+
+DEF_TEST(SkSLFPMatrixSampleInUniform, r) {
+    test(r,
+         *SkSL::ShaderCapsFactory::Default(),
+         R"__SkSL__(
+             in fragmentProcessor? child;
+             in uniform float3x3 matrix;
+             void main() {
+                 sk_OutColor = sample(child, matrix);
+             }
+         )__SkSL__",
+         /*expectedH=*/{
+             // Since 'matrix' is marked 'in', we can detect perspective at runtime
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeConstUniform(\"matrix\", matrix.hasPerspective()));"
+         },
+         /*expectedCPP=*/{
+             "this->invokeChildWithMatrix(_outer.child_index, args)"
+         });
+}
+
+DEF_TEST(SkSLFPMatrixSampleMultipleInUniforms, r) {
+    test(r,
+         *SkSL::ShaderCapsFactory::Default(),
+         R"__SkSL__(
+             in fragmentProcessor? child;
+             in uniform float3x3 matrixA;
+             in uniform float3x3 matrixB;
+             void main() {
+                 sk_OutColor = sample(child, matrixA);
+                 sk_OutColor += sample(child, matrixB);
+             }
+         )__SkSL__",
+         /*expectedH=*/{
+             // FIXME it would be nice if codegen can produce
+             // (matrixA.hasPerspective() || matrixB.hasPerspective()) even though it's variable.
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeVariable(true));"
+         },
+         /*expectedCPP=*/{
+             "SkString _matrix191(args.fUniformHandler->getUniformCStr(matrixAVar));",
+             "this->invokeChildWithMatrix(_outer.child_index, args, _matrix191.c_str());",
+             "SkString _matrix247(args.fUniformHandler->getUniformCStr(matrixBVar));",
+             "this->invokeChildWithMatrix(_outer.child_index, args, _matrix247.c_str());"
+         });
+}
+
+DEF_TEST(SkSLFPMatrixSampleConstUniformExpression, r) {
+    test(r,
+         *SkSL::ShaderCapsFactory::Default(),
+         R"__SkSL__(
+             in fragmentProcessor? child;
+             uniform float3x3 matrix;
+             void main() {
+                 sk_OutColor = sample(child, 0.5 * matrix);
+             }
+         )__SkSL__",
+         /*expectedH=*/{
+             // FIXME: "0.5 * matrix" is a constant/uniform expression and could be lifted to
+             // the vertex shader, once downstream code is able to properly map 'matrix' within the
+             // expression.
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeVariable(true));"
+         },
+         /*expectedCPP=*/{
+            "SkString _matrix145 = SkStringPrintf(\"0.5 * %s\", "
+                    "args.fUniformHandler->getUniformCStr(matrixVar));",
+             "this->invokeChildWithMatrix(_outer.child_index, args, _matrix145.c_str());"
+         });
+}
+
+DEF_TEST(SkSLFPMatrixSampleConstantAndExplicitly, r) {
+    test(r,
+         *SkSL::ShaderCapsFactory::Default(),
+         R"__SkSL__(
+             in fragmentProcessor? child;
+             void main() {
+                 sk_OutColor = sample(child, float3x3(0.5));
+                 sk_OutColor = sample(child, sk_TransformedCoords2D[0].xy / 2);
+             }
+         )__SkSL__",
+         /*expectedH=*/{
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeConstUniform(\"float3x3(0.5)\", true), true);"
+         },
+         /*expectedCPP=*/{
+             "this->invokeChildWithMatrix(_outer.child_index, args)",
+             "SkString _coords168 = SkStringPrintf(\"%s / 2.0\", args.fSampleCoord);",
+             "this->invokeChild(_outer.child_index, args, _coords168.c_str())",
+         });
+}
+
+DEF_TEST(SkSLFPMatrixSampleVariableAndExplicitly, r) {
+    test(r,
+         *SkSL::ShaderCapsFactory::Default(),
+         R"__SkSL__(
+             in fragmentProcessor? child;
+             void main() {
+                 float3x3 matrix = float3x3(sk_InColor.a);
+                 sk_OutColor = sample(child, matrix);
+                 sk_OutColor = sample(child, sk_TransformedCoords2D[0].xy / 2);
+             }
+         )__SkSL__",
+         /*expectedH=*/{
+             "this->registerChild(std::move(child), "
+                    "SkSL::SampleMatrix::MakeVariable(true), true);"
+         },
+         /*expectedCPP=*/{
+             "SkString _matrix166(\"matrix\");",
+             "this->invokeChildWithMatrix(_outer.child_index, args, _matrix166.c_str())",
+             "SkString _coords220 = SkStringPrintf(\"%s / 2.0\", args.fSampleCoord);",
+             "this->invokeChild(_outer.child_index, args, _coords220.c_str()",
+         });
 }