Move support for return values of called Python functions

Currently pybind11 always translates values returned by Python functions
invoked from C++ code by copying, even when moving is feasible--and,
more importantly, even when moving is required.

The first, and relatively minor, concern is that moving may be
considerably more efficient for some types.  The second problem,
however, is more serious: there's currently no way python code can
return a non-copyable type to C++ code.

I ran into this while trying to add a PYBIND11_OVERLOAD of a virtual
method that returns just such a type: it simply fails to compile because
this:

    overload = ...
    overload(args).template cast<ret_type>();

involves a copy: overload(args) returns an object instance, and the
invoked object::cast() loads the returned value, then returns a copy of
the loaded value.

We can, however, safely move that returned value *if* the object has the
only reference to it (i.e. if ref_count() == 1) and the object is
itself temporary (i.e. if it's an rvalue).

This commit does that by adding an rvalue-qualified object::cast()
method that allows the returned value to be move-constructed out of the
stored instance when feasible.

This basically comes down to three cases:

- For objects that are movable but not copyable, we always try the move,
  with a runtime exception raised if this would involve moving a value
  with multiple references.
- When the type is both movable and non-trivially copyable, the move
  happens only if the invoked object has a ref_count of 1, otherwise the
  object is copied.  (Trivially copyable types are excluded from this
  case because they are typically just collections of primitive types,
  which can be copied just as easily as they can be moved.)
- Non-movable and trivially copy constructible objects are simply
  copied.

This also adds examples to example-virtual-functions that shows both a
non-copyable object and a movable/copyable object in action: the former
raises an exception if returned while holding a reference, the latter
invokes a move constructor if unreferenced, or a copy constructor if
referenced.

Basically this allows code such as:

    class MyClass(Pybind11Class):
        def somemethod(self, whatever):
            mt = MovableType(whatever)
            # ...
            return mt

which allows the MovableType instance to be returned to the C++ code
via its move constructor.

Of course if you attempt to violate this by doing something like:

    self.value = MovableType(whatever)
    return self.value

you get an exception--but right now, the pybind11-side of that code
won't compile at all.
diff --git a/example/example-virtual-functions.py b/example/example-virtual-functions.py
index 1f71965..121f330 100644
--- a/example/example-virtual-functions.py
+++ b/example/example-virtual-functions.py
@@ -5,6 +5,8 @@
 
 from example import ExampleVirt, runExampleVirt, runExampleVirtVirtual, runExampleVirtBool
 from example import A_Repeat, B_Repeat, C_Repeat, D_Repeat, A_Tpl, B_Tpl, C_Tpl, D_Tpl
+from example import NCVirt, NonCopyable, Movable
+
 
 class ExtendedExampleVirt(ExampleVirt):
     def __init__(self, state):
@@ -87,3 +89,36 @@
     if hasattr(obj, "lucky_number"):
         print("Lucky = %.2f" % obj.lucky_number())
 
+class NCVirtExt(NCVirt):
+    def get_noncopyable(self, a, b):
+        # Constructs and returns a new instance:
+        nc = NonCopyable(a*a, b*b)
+        return nc
+    def get_movable(self, a, b):
+        # Return a referenced copy
+        self.movable = Movable(a, b)
+        return self.movable
+
+class NCVirtExt2(NCVirt):
+    def get_noncopyable(self, a, b):
+        # Keep a reference: this is going to throw an exception
+        self.nc = NonCopyable(a, b)
+        return self.nc
+    def get_movable(self, a, b):
+        # Return a new instance without storing it
+        return Movable(a, b)
+
+ncv1 = NCVirtExt()
+print("2^2 * 3^2 =")
+ncv1.print_nc(2, 3)
+print("4 + 5 =")
+ncv1.print_movable(4, 5)
+ncv2 = NCVirtExt2()
+print("7 + 7 =")
+ncv2.print_movable(7, 7)
+try:
+    ncv2.print_nc(9, 9)
+    print("Something's wrong: exception not raised!")
+except RuntimeError as e:
+    # Don't print the exception message here because it differs under debug/non-debug mode
+    print("Caught expected exception")