This document is intended for GPU IHVs writing Vulkan drivers for Android, and OEMs integrating them for specific devices. It describes how a Vulkan driver interacts with the system, how GPU-specific tools should be installed, and Android-specific requirements.

This is still a fairly rough draft; details will be filled in over time.

1. Architecture

The primary interface between Vulkan applications and a device’s Vulkan driver is the loader, which is part of AOSP and installed at /system/lib[64]/libvulkan.so. The loader provides the core Vulkan API entry points, as well as entry points of a few extensions that are required on Android and always present. In particular, the window system integration (WSI) extensions are exported by the loader and primarily implemented in it rather than the driver. The loader also supports enumerating and loading layers which can expose additional extensions and/or intercept core API calls on their way to the driver.

The NDK will include a stub libvulkan.so exporting the same symbols as the loader. Calling the Vulkan functions exported from libvulkan.so will enter trampoline functions in the loader which will dispatch to the appropriate layer or driver based on their first argument. The vkGet*ProcAddr calls will return the function pointers that the trampolines would dispatch to, so calling through these function pointers rather than the exported symbols will be slightly more efficient since it skips the trampoline and dispatch.

1.1. Driver Enumeration and Loading

Android expects the GPUs available to the system to be known when the system image is built, so its driver enumeration process isn’t as elaborate as other platforms. The loader will use the existing HAL mechanism (see hardware.h) for discovering and loading the driver. As of this writing, the preferred paths for 32-bit and 64-bit Vulkan drivers are:

/vendor/lib/hw/vulkan.<ro.product.platform>.so
/vendor/lib64/hw/vulkan.<ro.product.platform>.so

where <ro.product.platform> is replaced by the value of the system property of that name. See libhardware/hardware.c for details and supported alternative locations.

The Vulkan hw_module_t derivative is currently trivial. If support for multiple drivers is ever added, the HAL module will export a list of strings that can be passed to the module open call. For the time being, only one driver is supported, and the constant string HWVULKAN_DEVICE_0 is passed to open.

The Vulkan hw_device_t derivative corresponds to a single driver, though that driver can support multiple physical devices. The hw_device_t structure will be extended to export vkGetGlobalExtensionProperties, vkCreateInstance, and vkGetInstanceProcAddr functions. The loader will find all other VkInstance, VkPhysicalDevice, and vkGetDeviceProcAddr functions by calling vkGetInstanceProcAddr.

1.2. Layer Discovery and Loading

Android’s security model and policies differ significantly from other platforms. In particular, Android does not allow loading external code into a non-debuggable process on production (non-rooted) devices, nor does it allow external code to inspect or control the process’s memory/state/etc. This includes a prohibition on saving core dumps, API traces, etc. to disk for later inspection. So only layers delivered as part of the application will be enabled on production devices, and drivers must also not provide functionality that violates these policies.

There are three major use cases for layers:

  1. Development-time layers: validation layers, shims for tracing/profiling/debugging tools, etc. These shouldn’t be installed on the system image of production devices: they would be a waste of space for most users, and they should be updateable without requiring a system update. A developer wishing to use one of these during development has the ability to modify their application package (e.g. adding a file to their native libraries directory). IHV and OEM engineers who are trying to diagnose failures in shipping, unmodifiable apps are assumed to have access to non-production (rooted) builds of the system image.

  2. Utility layers, such as a layer that implements a heap for device memory. These layers will almost always expose extensions. Developers choose which layers, and which versions of those layers, to use in their application; different applications that use the same layer may still use different versions. Developers will choose which of these layers to ship in their application package.

  3. Injected layers, like framerate, social network, or game launcher overlays, which are provided by the user or some other application without the application’s knowledge or consent. These violate Android’s security policies and will not be supported.

In the normal state the loader will only search in the application’s native library directory for layers; details are TBD but it will probably just try to load any library with a name matching a particular pattern(e.g. libvklayer_foo.so). It will probably not need a separate manifest file; the developer deliberately included these layers, so the reasons to avoid loading libraries before enabling them don’t apply.

On debuggable devices (ro.debuggable property exists and is non-zero, generally rooted or engineering builds) or debuggable processes (prctl(PR_GET_DUMPABLE)==1, based on the application’s manifest), the loader may also search an adb-writeable location on /data for layers. It’s not clear whether this is useful; in all the cases it could be used, the layer could be just as easily be put in the application’s native library directory.

Finally, the loader may include a built-in validation layer that it will enable based on settings in the Developer Options menu, which would send validation errors or warnings to the system log. Drivers may be able to emit additional hardware-specific errors/warnings through this mechanism. This layer would not be enumerated through the API. This is intended to allow cooperative end-users to collect extra information about failures from unmodified applications on unmodified devices to aid triage/diagnosis of difficult-to-reproduce problems. The functionality would be intentionally limited to minimize security and privacy risk.

Our goal is to allow layers to be ported with only build-environment changes between Android and other platforms. This means the interface between layers and the loader must match the interface used by the LunarG loader. Currently, the LunarG interface has a few deficiencies and is largely unspecified. We intend to work with LunarG to correct as many deficiencies as we can and to specify the interface in detail so that layers can be implemented without referring to the loader source code.

2. Window System Integration

The vk_wsi_swapchin and vk_wsi_device_swapchain extensions will primarily be implemented by the platform and live in libvulkan.so. The VkSwapchain object and all interaction with ANativeWindow will be handled by the platform and not exposed to drivers. The WSI implementation will rely on a few private interfaces to the driver for this implementation. These will be loaded through the driver’s vkGetDeviceProcAddr functions, after passing through any enabled layers.

Implementations may need swapchain buffers to be allocated with implementation-defined private gralloc usage flags. When creating a swapchain, the platform will ask the driver to translate the requested format and image usage flags into gralloc usage flags by calling

VkResult VKAPI vkGetSwapchainGrallocUsageANDROID(
    VkDevice            device,
    VkFormat            format,
    VkImageUsageFlags   imageUsage,
    int*                grallocUsage
);

The format and imageUsage parameters are taken from the VkSwapchainCreateInfoKHR structure. The driver should fill *grallocUsage with the gralloc usage flags it requires for that format and usage. These will be combined with the usage flags requested by the swapchain consumer when allocating buffers.

VkNativeBufferANDROID is a vkCreateImage extension structure for creating an image backed by a gralloc buffer. This structure is provided to vkCreateImage in the VkImageCreateInfo structure chain. Calls to vkCreateImage with this structure will happen during the first call to vkGetSwapChainInfoWSI(.. VK_SWAP_CHAIN_INFO_TYPE_IMAGES_WSI ..). The WSI implementation will allocate the number of native buffers requested for the swapchain, then create a VkImage for each one.

typedef struct {
    VkStructureType             sType; // must be VK_STRUCTURE_TYPE_NATIVE_BUFFER_ANDROID
    const void*                 pNext;

    // Buffer handle and stride returned from gralloc alloc()
    buffer_handle_t             handle;
    int                         stride;

    // Gralloc format and usage requested when the buffer was allocated.
    int                         format;
    int                         usage;
} VkNativeBufferANDROID;

TBD: During swapchain re-creation (using oldSwapChain), we may have to defer allocation of new gralloc buffers until old buffers have been released. If so, the vkCreateImage calls will be deferred until the first vkAcquireNextImageWSI that would return the new image.

When creating a gralloc-backed image, the VkImageCreateInfo will have:

  .imageType           = VK_IMAGE_TYPE_2D
  .format              = a VkFormat matching the format requested for the gralloc buffer
  .extent              = the 2D dimensions requested for the gralloc buffer
  .mipLevels           = 1
  .arraySize           = 1
  .samples             = 1
  .tiling              = VK_IMAGE_TILING_OPTIMAL
  .usage               = VkSwapChainCreateInfoWSI::imageUsageFlags
  .flags               = 0
  .sharingMode         = VkSwapChainCreateInfoWSI::sharingMode
  .queueFamilyCount    = VkSwapChainCreateInfoWSI::queueFamilyCount
  .pQueueFamilyIndices = VkSwapChainCreateInfoWSI::pQueueFamilyIndices

vkAcquireImageANDROID acquires ownership of a swapchain image and imports an externally-signalled native fence into both an existing VkSemaphore object and an existing VkFence object:

VkResult VKAPI vkAcquireImageANDROID(
    VkDevice            device,
    VkImage             image,
    int                 nativeFenceFd,
    VkSemaphore         semaphore,
    VkFence             fence
);

This function is called during vkAcquireNextImageWSI to import a native fence into the VkSemaphore and VkFence objects provided by the application. Both semaphore and fence objects are optional in this call. The driver may also use this opportunity to recognize and handle any external changes to the gralloc buffer state; many drivers won’t need to do anything here. This call puts the VkSemaphore and VkFence into the same "pending" state as vkQueueSignalSemaphore and vkQueueSubmit respectively, so queues can wait on the semaphore and the application can wait on the fence. Both objects become signalled when the underlying native fence signals; if the native fence has already signalled, then the semaphore will be in the signalled state when this function returns. The driver takes ownership of the fence fd and is responsible for closing it when no longer needed. It must do so even if neither a semaphore or fence object is provided, or even if vkAcquireImageANDROID fails and returns an error. If fenceFd is -1, it is as if the native fence was already signalled.

vkQueueSignalReleaseImageANDROID prepares a swapchain image for external use, and creates a native fence and schedules it to be signalled when prior work on the queue has completed.

VkResult VKAPI vkQueueSignalReleaseImageANDROID(
    VkQueue             queue,
    uint32_t            waitSemaphoreCount,
    const VkSemaphore*  pWaitSemaphores,
    VkImage             image,
    int*                pNativeFenceFd
);

This will be called during vkQueuePresentWSI on the provided queue. Effects are similar to vkQueueSignalSemaphore, except with a native fence instead of a semaphore. The native fence must: not signal until the waitSemaphoreCount semaphores in pWaitSemaphores have signaled. Unlike vkQueueSignalSemaphore, however, this call creates and returns the synchronization object that will be signalled rather than having it provided as input. If the queue is already idle when this function is called, it is allowed but not required to set *pNativeFenceFd to -1. The file descriptor returned in *pNativeFenceFd is owned and will be closed by the caller. Many drivers will be able to ignore the image parameter, but some may need to prepare CPU-side data structures associated with a gralloc buffer for use by external image consumers. Preparing buffer contents for use by external consumers should have been done asynchronously as part of transitioning the image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.

3. History

  1. 2015-07-08 Initial version

  2. 2015-08-16

    • Renamed to Implementor’s Guide

    • Wording and formatting changes

    • Updated based on resolution of Khronos bug 14265

    • Deferred support for multiple drivers

  3. 2015-11-04

    • Added vkGetSwapchainGrallocUsageANDROID

    • Replaced vkImportNativeFenceANDROID and vkQueueSignalNativeFenceANDROID with vkAcquireImageANDROID and vkQueueSignalReleaseImageANDROID, to allow drivers to known the ownership state of swapchain images.

  4. 2015-12-03

    • Added a VkFence parameter to vkAcquireImageANDROID corresponding to the parameter added to vkAcquireNextImageKHR.

  5. 2016-01-08

    • Added waitSemaphoreCount and pWaitSemaphores parameters to vkQueueSignalReleaseImageANDROID.