Fix regression: container pointers not castable

PR #936 broke the ability to return a pointer to a stl container (and,
likewise, to a tuple) because the added deduced type matched a
non-const pointer argument: the pointer-accepting `cast` in
PYBIND11_TYPE_CASTER had a `const type *`, which is a worse match for a
non-const pointer than the universal reference template #936 added.

This changes the provided TYPE_CASTER cast(ptr) to take the pointer by
template arg (so that it will accept either const or non-const pointer).
It has two other effects: it slightly reduces .so size (because many
type casters never actually need the pointer cast at all), and it allows
type casters to provide their untemplated pointer `cast()` that will
take precedence over the templated version provided in the macro.
diff --git a/tests/test_stl.py b/tests/test_stl.py
index 0f01982..c2fa7db 100644
--- a/tests/test_stl.py
+++ b/tests/test_stl.py
@@ -15,6 +15,9 @@
     assert doc(m.cast_vector) == "cast_vector() -> List[int]"
     assert doc(m.load_vector) == "load_vector(arg0: List[int]) -> bool"
 
+    # Test regression caused by 936: pointers to stl containers weren't castable
+    assert m.cast_ptr_vector() == ["lvalue", "lvalue"]
+
 
 def test_array(doc):
     """std::array <-> list"""