SkOnce: 2 bytes -> 1 byte

This uses the same logic we worked out for SkOncePtr to reduce
the memory footprint of SkOnce from a done byte and lock byte
to a single 3-state byte:

  - NotStarted: no thread has tried to run fn() yet
  - Active:     a thread is running fn()
  - Done:       fn() is complete

Threads which see Done return immediately.
Threads which see NotStarted try to move to Active, run fn(), then move to Done.
Threads which see Active spin until the active thread moves to Done.

This additionally fixes a too-weak memory order bug in SkOncePtr,
and adds a big note to explain.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1904483003

Review URL: https://codereview.chromium.org/1904483003
2 files changed