drm/nv40: implement ctxprog/state generation

The context programs are *very* simple compared to the ones used by
the binary driver.  There's notes in nv40_grctx.c explaining most of
the things we don't implement.  If we discover if/why any of it is
required further down the track, we'll handle it then.

The PGRAPH state generated for each chipset should match what NVIDIA
do almost exactly (there's a couple of exceptions).  If someone has
a lot of time on their hands, they could figure out the mapping of
object/method to PGRAPH register and demagic the initial state a little,
it's not terribly important however.

At time of commit, confirmed to be working at least well enough for
accelerated X (and where tested, for 3D apps) on NV40, NV43, NV44, NV46,
NV49, NV4A, NV4B and NV4E.

A module option has been added to force the use of external firmware
blobs if it becomes required.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
diff --git a/drivers/gpu/drm/nouveau/nv50_graph.c b/drivers/gpu/drm/nouveau/nv50_graph.c
index 177d822..ca79f32 100644
--- a/drivers/gpu/drm/nouveau/nv50_graph.c
+++ b/drivers/gpu/drm/nouveau/nv50_graph.c
@@ -107,9 +107,13 @@
 static int
 nv50_graph_init_ctxctl(struct drm_device *dev)
 {
+	struct drm_nouveau_private *dev_priv = dev->dev_private;
+
 	NV_DEBUG(dev, "\n");
 
-	nv40_grctx_init(dev);
+	nouveau_grctx_prog_load(dev);
+	if (!dev_priv->engine.graph.ctxprog)
+		dev_priv->engine.graph.accel_blocked = true;
 
 	nv_wr32(dev, 0x400320, 4);
 	nv_wr32(dev, NV40_PGRAPH_CTXCTL_CUR, 0);
@@ -140,7 +144,7 @@
 nv50_graph_takedown(struct drm_device *dev)
 {
 	NV_DEBUG(dev, "\n");
-	nv40_grctx_fini(dev);
+	nouveau_grctx_fini(dev);
 }
 
 void
@@ -207,7 +211,7 @@
 	dev_priv->engine.instmem.finish_access(dev);
 
 	dev_priv->engine.instmem.prepare_access(dev, true);
-	nv40_grctx_vals_load(dev, ctx);
+	nouveau_grctx_vals_load(dev, ctx);
 	nv_wo32(dev, ctx, 0x00000/4, chan->ramin->instance >> 12);
 	if ((dev_priv->chipset & 0xf0) == 0xa0)
 		nv_wo32(dev, ctx, 0x00004/4, 0x00000000);