in wide builds, avoid storing high unicode characters from source code with surrogates

This is accomplished by decoding with utf-32 instead of utf-16 on all builds.
The patch is by Adam Olsen.
3 files changed