docs: update generated docs (#981)
diff --git a/docs/dyn/dataflow_v1b3.projects.jobs.html b/docs/dyn/dataflow_v1b3.projects.jobs.html
index 8287947..df7d7bd 100644
--- a/docs/dyn/dataflow_v1b3.projects.jobs.html
+++ b/docs/dyn/dataflow_v1b3.projects.jobs.html
@@ -90,7 +90,7 @@
<p class="firstline">Returns the workItems Resource.</p>
<p class="toc_element">
- <code><a href="#aggregated">aggregated(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>
+ <code><a href="#aggregated">aggregated(projectId, pageSize=None, filter=None, location=None, pageToken=None, view=None, x__xgafv=None)</a></code></p>
<p class="firstline">List the jobs of a project across all regions.</p>
<p class="toc_element">
<code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>
@@ -99,13 +99,13 @@
<code><a href="#create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>
<p class="firstline">Creates a Cloud Dataflow job.</p>
<p class="toc_element">
- <code><a href="#get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</a></code></p>
+ <code><a href="#get">get(projectId, jobId, location=None, view=None, x__xgafv=None)</a></code></p>
<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
<p class="toc_element">
<code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>
<p class="firstline">Request the job status.</p>
<p class="toc_element">
- <code><a href="#list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>
+ <code><a href="#list">list(projectId, filter=None, pageSize=None, location=None, view=None, pageToken=None, x__xgafv=None)</a></code></p>
<p class="firstline">List the jobs of a project.</p>
<p class="toc_element">
<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
@@ -118,20 +118,20 @@
<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
<h3>Method Details</h3>
<div class="method">
- <code class="details" id="aggregated">aggregated(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>
+ <code class="details" id="aggregated">aggregated(projectId, pageSize=None, filter=None, location=None, pageToken=None, view=None, x__xgafv=None)</code>
<pre>List the jobs of a project across all regions.
Args:
projectId: string, The project which owns the jobs. (required)
+ pageSize: integer, If there are many jobs, limit response to at most this many.
+The actual number of jobs returned will be the lesser of max_responses
+and an unspecified server-defined limit.
filter: string, The kind of filter to use.
location: string, The [regional endpoint]
(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
contains this job.
pageToken: string, Set this to the 'next_page_token' field of a previous response
to request additional results in a long list.
- pageSize: integer, If there are many jobs, limit response to at most this many.
-The actual number of jobs returned will be the lesser of max_responses
-and an unspecified server-defined limit.
view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
x__xgafv: string, V1 error format.
Allowed values
@@ -148,243 +148,285 @@
# body is empty {}.
"jobs": [ # A subset of the requested job information.
{ # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
- "a_key": "A String",
- },
- "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
- "workerRegion": "A String", # The Compute Engine region
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1". Mutually exclusive
- # with worker_zone. If neither worker_region nor worker_zone is specified,
- # default to the control plane's region.
- "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
- # at rest, AKA a Customer Managed Encryption Key (CMEK).
- #
- # Format:
- # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
- "workerZone": "A String", # The Compute Engine zone
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
- # with worker_region. If neither worker_region nor worker_zone is specified,
- # a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
- "workerPools": [ # The worker pools. At least one "harness" worker pool must be
- # specified in order for the job to have workers.
- { # Describes one particular pool of Cloud Dataflow workers to be
- # instantiated by the Cloud Dataflow service in order to perform the
- # computations required by a job. Note that a workflow job may use
- # multiple pools, in order to match the various computational
- # requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
- # will attempt to choose a reasonable default.
- "diskSourceImage": "A String", # Fully qualified source image for disks.
- "packages": [ # Packages to be installed on workers.
- { # The packages that must be installed in order for a worker to run the
- # steps of the Cloud Dataflow job that will be assigned to its worker
- # pool.
- #
- # This is the mechanism by which the Cloud Dataflow SDK causes code to
- # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
- # might use this to install jars containing the user's code and all of the
- # various dependencies (libraries, data files, etc.) required in order
- # for that code to run.
- "name": "A String", # The name of the package.
- "location": "A String", # The resource to read the package from. The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}
- # bucket.storage.googleapis.com/
- },
- ],
- "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
- # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
- # `TEARDOWN_NEVER`.
- # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
- # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
- # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
- # down.
- #
- # If the workers are not torn down by the service, they will
- # continue to run and use Google Compute Engine VM resources in the
- # user's project until they are explicitly terminated by the user.
- # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
- # policy except for small, manually supervised test jobs.
- #
- # If unknown or unspecified, the service will attempt to choose a reasonable
- # default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
- "poolArgs": { # Extra arguments for this worker pool.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
- # harness, residing in Google Container Registry.
- #
- # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
- "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
- # using the standard Dataflow task runner. Users should ignore
- # this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
- "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
- # access the Cloud Dataflow API.
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
- # console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
- "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage.
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
+ },
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
+ },
+ ],
+ },
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
+ "a_key": "A String",
+ },
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
+ "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
+ "workerRegion": "A String", # The Compute Engine region
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1". Mutually exclusive
+ # with worker_zone. If neither worker_region nor worker_zone is specified,
+ # default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
+ # at rest, AKA a Customer Managed Encryption Key (CMEK).
+ #
+ # Format:
+ # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
+ "workerZone": "A String", # The Compute Engine zone
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
+ # with worker_region. If neither worker_region nor worker_zone is specified,
+ # a zone in the control plane's region is chosen based on available capacity.
+ "workerPools": [ # The worker pools. At least one "harness" worker pool must be
+ # specified in order for the job to have workers.
+ { # Describes one particular pool of Cloud Dataflow workers to be
+ # instantiated by the Cloud Dataflow service in order to perform the
+ # computations required by a job. Note that a workflow job may use
+ # multiple pools, in order to match the various computational
+ # requirements of the various stages of the job.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
+ "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
+ # will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
+ "diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
+ "packages": [ # Packages to be installed on workers.
+ { # The packages that must be installed in order for a worker to run the
+ # steps of the Cloud Dataflow job that will be assigned to its worker
+ # pool.
#
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "reportingEnabled": True or False, # Whether to send work progress updates to the service.
- "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ # This is the mechanism by which the Cloud Dataflow SDK causes code to
+ # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
+ # might use this to install jars containing the user's code and all of the
+ # various dependencies (libraries, data files, etc.) required in order
+ # for that code to run.
+ "name": "A String", # The name of the package.
+ "location": "A String", # The resource to read the package from. The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}
+ # bucket.storage.googleapis.com/
+ },
+ ],
+ "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
+ # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
+ # `TEARDOWN_NEVER`.
+ # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
+ # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
+ # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
+ # down.
+ #
+ # If the workers are not torn down by the service, they will
+ # continue to run and use Google Compute Engine VM resources in the
+ # user's project until they are explicitly terminated by the user.
+ # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
+ # policy except for small, manually supervised test jobs.
+ #
+ # If unknown or unspecified, the service will attempt to choose a reasonable
+ # default.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
+ "poolArgs": { # Extra arguments for this worker pool.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
+ "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
+ # harness, residing in Google Container Registry.
+ #
+ # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
+ "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
+ # using the standard Dataflow task runner. Users should ignore
+ # this field.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
+ "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
+ # access the Cloud Dataflow API.
+ "A String",
+ ],
+ "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
# relative URLs. If this field is specified, it supplies the base
@@ -393,340 +435,299 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
- "workerId": "A String", # The ID of the worker running this pipeline.
+ "workflowFileName": "A String", # The file to store the workflow in.
+ "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
+ # console.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
+ "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
+ "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workerId": "A String", # The ID of the worker running this pipeline.
+ },
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
},
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "defaultPackageSet": "A String", # The default package set to install. This allows the service to
+ # select a default set of packages which are useful to worker
+ # harnesses written in a particular language.
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
- },
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
- "defaultPackageSet": "A String", # The default package set to install. This allows the service to
- # select a default set of packages which are useful to worker
- # harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
+ ],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
},
- ],
- "dataset": "A String", # The dataset for the current project where various workflow
- # related tables are stored.
- #
- # The supported resource type is:
- #
- # Google BigQuery:
- # bigquery.googleapis.com/{dataset}
- },
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
- },
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
- "steps": [ # Exactly one of step or steps_location should be specified.
- #
- # The top-level steps that constitute the entire job.
- { # Defines a particular step within a Cloud Dataflow job.
- #
- # A job consists of multiple steps, each of which performs some
- # specific operation as part of the overall job. Data is typically
- # passed from one step to another as part of the job.
- #
- # Here's an example of a sequence of steps which together implement a
- # Map-Reduce job:
- #
- # * Read a collection of data from some source, parsing the
- # collection's elements.
- #
- # * Validate the elements.
- #
- # * Apply a user-defined function to map each element to some value
- # and extract an element-specific key value.
- #
- # * Group elements with the same key into a single element with
- # that key, transforming a multiply-keyed collection into a
- # uniquely-keyed collection.
- #
- # * Write the elements out to some data sink.
- #
- # Note that the Cloud Dataflow service may be used to run many different
- # types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
- "kind": "A String", # The kind of step in the Cloud Dataflow job.
- "properties": { # Named properties associated with the step. Each kind of
- # predefined step has its own required set of properties.
- # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
"a_key": "", # Properties of the object.
},
+ "dataset": "A String", # The dataset for the current project where various workflow
+ # related tables are stored.
+ #
+ # The supported resource type is:
+ #
+ # Google BigQuery:
+ # bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
},
- ],
- "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
- # `JOB_STATE_UPDATED`), this field contains the ID of that job.
- "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
- # isn't contained in the submitted job.
- "stages": { # A mapping from each stage to the information about that stage.
- "a_key": { # Contains information about how a particular
- # google.dataflow.v1beta3.Step will be executed.
- "stepName": [ # The steps associated with the execution stage.
- # Note that stages may have several steps, and that a given step
- # might be run by more than one stage.
- "A String",
- ],
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
+ "steps": [ # Exactly one of step or steps_location should be specified.
+ #
+ # The top-level steps that constitute the entire job.
+ { # Defines a particular step within a Cloud Dataflow job.
+ #
+ # A job consists of multiple steps, each of which performs some
+ # specific operation as part of the overall job. Data is typically
+ # passed from one step to another as part of the job.
+ #
+ # Here's an example of a sequence of steps which together implement a
+ # Map-Reduce job:
+ #
+ # * Read a collection of data from some source, parsing the
+ # collection's elements.
+ #
+ # * Validate the elements.
+ #
+ # * Apply a user-defined function to map each element to some value
+ # and extract an element-specific key value.
+ #
+ # * Group elements with the same key into a single element with
+ # that key, transforming a multiply-keyed collection into a
+ # uniquely-keyed collection.
+ #
+ # * Write the elements out to some data sink.
+ #
+ # Note that the Cloud Dataflow service may be used to run many different
+ # types of jobs, not just Map-Reduce.
+ "kind": "A String", # The kind of step in the Cloud Dataflow job.
+ "properties": { # Named properties associated with the step. Each kind of
+ # predefined step has its own required set of properties.
+ # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "a_key": "", # Properties of the object.
+ },
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
+ },
+ ],
+ "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
+ # `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
+ "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
+ # isn't contained in the submitted job.
+ "stages": { # A mapping from each stage to the information about that stage.
+ "a_key": { # Contains information about how a particular
+ # google.dataflow.v1beta3.Step will be executed.
+ "stepName": [ # The steps associated with the execution stage.
+ # Note that stages may have several steps, and that a given step
+ # might be run by more than one stage.
+ "A String",
+ ],
+ },
},
},
+ "type": "A String", # The type of Cloud Dataflow job.
+ "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
+ # Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
+ "requestedState": "A String", # The job's requested state.
+ #
+ # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
+ # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
+ # also be used to directly set a job's requested state to
+ # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
+ # job if it has not already reached a terminal state.
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
},
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
- "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
- # Cloud Dataflow service.
- "requestedState": "A String", # The job's requested state.
- #
- # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
- # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
- # also be used to directly set a job's requested state to
- # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
- # job if it has not already reached a terminal state.
- },
],
+ "nextPageToken": "A String", # Set if there may be more results than fit in this response.
"failedLocation": [ # Zero or more messages describing the [regional endpoints]
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
# failed to respond.
@@ -738,7 +739,6 @@
# failed to respond.
},
],
- "nextPageToken": "A String", # Set if there may be more results than fit in this response.
}</pre>
</div>
@@ -772,243 +772,285 @@
The object takes the form of:
{ # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
- "a_key": "A String",
- },
- "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
- "workerRegion": "A String", # The Compute Engine region
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1". Mutually exclusive
- # with worker_zone. If neither worker_region nor worker_zone is specified,
- # default to the control plane's region.
- "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
- # at rest, AKA a Customer Managed Encryption Key (CMEK).
- #
- # Format:
- # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
- "workerZone": "A String", # The Compute Engine zone
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
- # with worker_region. If neither worker_region nor worker_zone is specified,
- # a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
- "workerPools": [ # The worker pools. At least one "harness" worker pool must be
- # specified in order for the job to have workers.
- { # Describes one particular pool of Cloud Dataflow workers to be
- # instantiated by the Cloud Dataflow service in order to perform the
- # computations required by a job. Note that a workflow job may use
- # multiple pools, in order to match the various computational
- # requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
- # will attempt to choose a reasonable default.
- "diskSourceImage": "A String", # Fully qualified source image for disks.
- "packages": [ # Packages to be installed on workers.
- { # The packages that must be installed in order for a worker to run the
- # steps of the Cloud Dataflow job that will be assigned to its worker
- # pool.
- #
- # This is the mechanism by which the Cloud Dataflow SDK causes code to
- # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
- # might use this to install jars containing the user's code and all of the
- # various dependencies (libraries, data files, etc.) required in order
- # for that code to run.
- "name": "A String", # The name of the package.
- "location": "A String", # The resource to read the package from. The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}
- # bucket.storage.googleapis.com/
- },
- ],
- "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
- # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
- # `TEARDOWN_NEVER`.
- # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
- # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
- # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
- # down.
- #
- # If the workers are not torn down by the service, they will
- # continue to run and use Google Compute Engine VM resources in the
- # user's project until they are explicitly terminated by the user.
- # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
- # policy except for small, manually supervised test jobs.
- #
- # If unknown or unspecified, the service will attempt to choose a reasonable
- # default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
- "poolArgs": { # Extra arguments for this worker pool.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
- # harness, residing in Google Container Registry.
- #
- # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
- "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
- # using the standard Dataflow task runner. Users should ignore
- # this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
- "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
- # access the Cloud Dataflow API.
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
- # console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
- "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage.
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
+ },
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
+ },
+ ],
+ },
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
+ "a_key": "A String",
+ },
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
+ "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
+ "workerRegion": "A String", # The Compute Engine region
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1". Mutually exclusive
+ # with worker_zone. If neither worker_region nor worker_zone is specified,
+ # default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
+ # at rest, AKA a Customer Managed Encryption Key (CMEK).
+ #
+ # Format:
+ # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
+ "workerZone": "A String", # The Compute Engine zone
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
+ # with worker_region. If neither worker_region nor worker_zone is specified,
+ # a zone in the control plane's region is chosen based on available capacity.
+ "workerPools": [ # The worker pools. At least one "harness" worker pool must be
+ # specified in order for the job to have workers.
+ { # Describes one particular pool of Cloud Dataflow workers to be
+ # instantiated by the Cloud Dataflow service in order to perform the
+ # computations required by a job. Note that a workflow job may use
+ # multiple pools, in order to match the various computational
+ # requirements of the various stages of the job.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
+ "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
+ # will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
+ "diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
+ "packages": [ # Packages to be installed on workers.
+ { # The packages that must be installed in order for a worker to run the
+ # steps of the Cloud Dataflow job that will be assigned to its worker
+ # pool.
#
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "reportingEnabled": True or False, # Whether to send work progress updates to the service.
- "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ # This is the mechanism by which the Cloud Dataflow SDK causes code to
+ # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
+ # might use this to install jars containing the user's code and all of the
+ # various dependencies (libraries, data files, etc.) required in order
+ # for that code to run.
+ "name": "A String", # The name of the package.
+ "location": "A String", # The resource to read the package from. The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}
+ # bucket.storage.googleapis.com/
+ },
+ ],
+ "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
+ # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
+ # `TEARDOWN_NEVER`.
+ # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
+ # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
+ # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
+ # down.
+ #
+ # If the workers are not torn down by the service, they will
+ # continue to run and use Google Compute Engine VM resources in the
+ # user's project until they are explicitly terminated by the user.
+ # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
+ # policy except for small, manually supervised test jobs.
+ #
+ # If unknown or unspecified, the service will attempt to choose a reasonable
+ # default.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
+ "poolArgs": { # Extra arguments for this worker pool.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
+ "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
+ # harness, residing in Google Container Registry.
+ #
+ # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
+ "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
+ # using the standard Dataflow task runner. Users should ignore
+ # this field.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
+ "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
+ # access the Cloud Dataflow API.
+ "A String",
+ ],
+ "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
# relative URLs. If this field is specified, it supplies the base
@@ -1017,339 +1059,297 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
- "workerId": "A String", # The ID of the worker running this pipeline.
+ "workflowFileName": "A String", # The file to store the workflow in.
+ "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
+ # console.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
+ "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
+ "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workerId": "A String", # The ID of the worker running this pipeline.
+ },
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
},
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "defaultPackageSet": "A String", # The default package set to install. This allows the service to
+ # select a default set of packages which are useful to worker
+ # harnesses written in a particular language.
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
- },
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
- "defaultPackageSet": "A String", # The default package set to install. This allows the service to
- # select a default set of packages which are useful to worker
- # harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
+ ],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
},
- ],
- "dataset": "A String", # The dataset for the current project where various workflow
- # related tables are stored.
- #
- # The supported resource type is:
- #
- # Google BigQuery:
- # bigquery.googleapis.com/{dataset}
- },
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
- },
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
- "steps": [ # Exactly one of step or steps_location should be specified.
- #
- # The top-level steps that constitute the entire job.
- { # Defines a particular step within a Cloud Dataflow job.
- #
- # A job consists of multiple steps, each of which performs some
- # specific operation as part of the overall job. Data is typically
- # passed from one step to another as part of the job.
- #
- # Here's an example of a sequence of steps which together implement a
- # Map-Reduce job:
- #
- # * Read a collection of data from some source, parsing the
- # collection's elements.
- #
- # * Validate the elements.
- #
- # * Apply a user-defined function to map each element to some value
- # and extract an element-specific key value.
- #
- # * Group elements with the same key into a single element with
- # that key, transforming a multiply-keyed collection into a
- # uniquely-keyed collection.
- #
- # * Write the elements out to some data sink.
- #
- # Note that the Cloud Dataflow service may be used to run many different
- # types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
- "kind": "A String", # The kind of step in the Cloud Dataflow job.
- "properties": { # Named properties associated with the step. Each kind of
- # predefined step has its own required set of properties.
- # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
"a_key": "", # Properties of the object.
},
+ "dataset": "A String", # The dataset for the current project where various workflow
+ # related tables are stored.
+ #
+ # The supported resource type is:
+ #
+ # Google BigQuery:
+ # bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
},
- ],
- "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
- # `JOB_STATE_UPDATED`), this field contains the ID of that job.
- "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
- # isn't contained in the submitted job.
- "stages": { # A mapping from each stage to the information about that stage.
- "a_key": { # Contains information about how a particular
- # google.dataflow.v1beta3.Step will be executed.
- "stepName": [ # The steps associated with the execution stage.
- # Note that stages may have several steps, and that a given step
- # might be run by more than one stage.
- "A String",
- ],
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
+ "steps": [ # Exactly one of step or steps_location should be specified.
+ #
+ # The top-level steps that constitute the entire job.
+ { # Defines a particular step within a Cloud Dataflow job.
+ #
+ # A job consists of multiple steps, each of which performs some
+ # specific operation as part of the overall job. Data is typically
+ # passed from one step to another as part of the job.
+ #
+ # Here's an example of a sequence of steps which together implement a
+ # Map-Reduce job:
+ #
+ # * Read a collection of data from some source, parsing the
+ # collection's elements.
+ #
+ # * Validate the elements.
+ #
+ # * Apply a user-defined function to map each element to some value
+ # and extract an element-specific key value.
+ #
+ # * Group elements with the same key into a single element with
+ # that key, transforming a multiply-keyed collection into a
+ # uniquely-keyed collection.
+ #
+ # * Write the elements out to some data sink.
+ #
+ # Note that the Cloud Dataflow service may be used to run many different
+ # types of jobs, not just Map-Reduce.
+ "kind": "A String", # The kind of step in the Cloud Dataflow job.
+ "properties": { # Named properties associated with the step. Each kind of
+ # predefined step has its own required set of properties.
+ # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "a_key": "", # Properties of the object.
+ },
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
+ },
+ ],
+ "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
+ # `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
+ "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
+ # isn't contained in the submitted job.
+ "stages": { # A mapping from each stage to the information about that stage.
+ "a_key": { # Contains information about how a particular
+ # google.dataflow.v1beta3.Step will be executed.
+ "stepName": [ # The steps associated with the execution stage.
+ # Note that stages may have several steps, and that a given step
+ # might be run by more than one stage.
+ "A String",
+ ],
+ },
},
},
- },
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
- "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
- # Cloud Dataflow service.
- "requestedState": "A String", # The job's requested state.
- #
- # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
- # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
- # also be used to directly set a job's requested state to
- # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
- # job if it has not already reached a terminal state.
-}
+ "type": "A String", # The type of Cloud Dataflow job.
+ "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
+ # Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
+ "requestedState": "A String", # The job's requested state.
+ #
+ # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
+ # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
+ # also be used to directly set a job's requested state to
+ # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
+ # job if it has not already reached a terminal state.
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
+ }
location: string, The [regional endpoint]
(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
@@ -1365,243 +1365,285 @@
An object of the form:
{ # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
- "a_key": "A String",
- },
- "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
- "workerRegion": "A String", # The Compute Engine region
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1". Mutually exclusive
- # with worker_zone. If neither worker_region nor worker_zone is specified,
- # default to the control plane's region.
- "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
- # at rest, AKA a Customer Managed Encryption Key (CMEK).
- #
- # Format:
- # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
- "workerZone": "A String", # The Compute Engine zone
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
- # with worker_region. If neither worker_region nor worker_zone is specified,
- # a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
- "workerPools": [ # The worker pools. At least one "harness" worker pool must be
- # specified in order for the job to have workers.
- { # Describes one particular pool of Cloud Dataflow workers to be
- # instantiated by the Cloud Dataflow service in order to perform the
- # computations required by a job. Note that a workflow job may use
- # multiple pools, in order to match the various computational
- # requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
- # will attempt to choose a reasonable default.
- "diskSourceImage": "A String", # Fully qualified source image for disks.
- "packages": [ # Packages to be installed on workers.
- { # The packages that must be installed in order for a worker to run the
- # steps of the Cloud Dataflow job that will be assigned to its worker
- # pool.
- #
- # This is the mechanism by which the Cloud Dataflow SDK causes code to
- # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
- # might use this to install jars containing the user's code and all of the
- # various dependencies (libraries, data files, etc.) required in order
- # for that code to run.
- "name": "A String", # The name of the package.
- "location": "A String", # The resource to read the package from. The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}
- # bucket.storage.googleapis.com/
- },
- ],
- "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
- # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
- # `TEARDOWN_NEVER`.
- # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
- # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
- # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
- # down.
- #
- # If the workers are not torn down by the service, they will
- # continue to run and use Google Compute Engine VM resources in the
- # user's project until they are explicitly terminated by the user.
- # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
- # policy except for small, manually supervised test jobs.
- #
- # If unknown or unspecified, the service will attempt to choose a reasonable
- # default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
- "poolArgs": { # Extra arguments for this worker pool.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
- # harness, residing in Google Container Registry.
- #
- # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
- "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
- # using the standard Dataflow task runner. Users should ignore
- # this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
- "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
- # access the Cloud Dataflow API.
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
- # console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
- "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage.
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
+ },
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
+ },
+ ],
+ },
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
+ "a_key": "A String",
+ },
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
+ "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
+ "workerRegion": "A String", # The Compute Engine region
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1". Mutually exclusive
+ # with worker_zone. If neither worker_region nor worker_zone is specified,
+ # default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
+ # at rest, AKA a Customer Managed Encryption Key (CMEK).
+ #
+ # Format:
+ # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
+ "workerZone": "A String", # The Compute Engine zone
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
+ # with worker_region. If neither worker_region nor worker_zone is specified,
+ # a zone in the control plane's region is chosen based on available capacity.
+ "workerPools": [ # The worker pools. At least one "harness" worker pool must be
+ # specified in order for the job to have workers.
+ { # Describes one particular pool of Cloud Dataflow workers to be
+ # instantiated by the Cloud Dataflow service in order to perform the
+ # computations required by a job. Note that a workflow job may use
+ # multiple pools, in order to match the various computational
+ # requirements of the various stages of the job.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
+ "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
+ # will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
+ "diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
+ "packages": [ # Packages to be installed on workers.
+ { # The packages that must be installed in order for a worker to run the
+ # steps of the Cloud Dataflow job that will be assigned to its worker
+ # pool.
#
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "reportingEnabled": True or False, # Whether to send work progress updates to the service.
- "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ # This is the mechanism by which the Cloud Dataflow SDK causes code to
+ # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
+ # might use this to install jars containing the user's code and all of the
+ # various dependencies (libraries, data files, etc.) required in order
+ # for that code to run.
+ "name": "A String", # The name of the package.
+ "location": "A String", # The resource to read the package from. The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}
+ # bucket.storage.googleapis.com/
+ },
+ ],
+ "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
+ # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
+ # `TEARDOWN_NEVER`.
+ # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
+ # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
+ # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
+ # down.
+ #
+ # If the workers are not torn down by the service, they will
+ # continue to run and use Google Compute Engine VM resources in the
+ # user's project until they are explicitly terminated by the user.
+ # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
+ # policy except for small, manually supervised test jobs.
+ #
+ # If unknown or unspecified, the service will attempt to choose a reasonable
+ # default.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
+ "poolArgs": { # Extra arguments for this worker pool.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
+ "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
+ # harness, residing in Google Container Registry.
+ #
+ # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
+ "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
+ # using the standard Dataflow task runner. Users should ignore
+ # this field.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
+ "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
+ # access the Cloud Dataflow API.
+ "A String",
+ ],
+ "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
# relative URLs. If this field is specified, it supplies the base
@@ -1610,343 +1652,301 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
- "workerId": "A String", # The ID of the worker running this pipeline.
+ "workflowFileName": "A String", # The file to store the workflow in.
+ "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
+ # console.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
+ "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
+ "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workerId": "A String", # The ID of the worker running this pipeline.
+ },
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
},
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "defaultPackageSet": "A String", # The default package set to install. This allows the service to
+ # select a default set of packages which are useful to worker
+ # harnesses written in a particular language.
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
- },
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
- "defaultPackageSet": "A String", # The default package set to install. This allows the service to
- # select a default set of packages which are useful to worker
- # harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
+ ],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
},
- ],
- "dataset": "A String", # The dataset for the current project where various workflow
- # related tables are stored.
- #
- # The supported resource type is:
- #
- # Google BigQuery:
- # bigquery.googleapis.com/{dataset}
- },
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
- },
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
- "steps": [ # Exactly one of step or steps_location should be specified.
- #
- # The top-level steps that constitute the entire job.
- { # Defines a particular step within a Cloud Dataflow job.
- #
- # A job consists of multiple steps, each of which performs some
- # specific operation as part of the overall job. Data is typically
- # passed from one step to another as part of the job.
- #
- # Here's an example of a sequence of steps which together implement a
- # Map-Reduce job:
- #
- # * Read a collection of data from some source, parsing the
- # collection's elements.
- #
- # * Validate the elements.
- #
- # * Apply a user-defined function to map each element to some value
- # and extract an element-specific key value.
- #
- # * Group elements with the same key into a single element with
- # that key, transforming a multiply-keyed collection into a
- # uniquely-keyed collection.
- #
- # * Write the elements out to some data sink.
- #
- # Note that the Cloud Dataflow service may be used to run many different
- # types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
- "kind": "A String", # The kind of step in the Cloud Dataflow job.
- "properties": { # Named properties associated with the step. Each kind of
- # predefined step has its own required set of properties.
- # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
"a_key": "", # Properties of the object.
},
+ "dataset": "A String", # The dataset for the current project where various workflow
+ # related tables are stored.
+ #
+ # The supported resource type is:
+ #
+ # Google BigQuery:
+ # bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
},
- ],
- "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
- # `JOB_STATE_UPDATED`), this field contains the ID of that job.
- "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
- # isn't contained in the submitted job.
- "stages": { # A mapping from each stage to the information about that stage.
- "a_key": { # Contains information about how a particular
- # google.dataflow.v1beta3.Step will be executed.
- "stepName": [ # The steps associated with the execution stage.
- # Note that stages may have several steps, and that a given step
- # might be run by more than one stage.
- "A String",
- ],
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
+ "steps": [ # Exactly one of step or steps_location should be specified.
+ #
+ # The top-level steps that constitute the entire job.
+ { # Defines a particular step within a Cloud Dataflow job.
+ #
+ # A job consists of multiple steps, each of which performs some
+ # specific operation as part of the overall job. Data is typically
+ # passed from one step to another as part of the job.
+ #
+ # Here's an example of a sequence of steps which together implement a
+ # Map-Reduce job:
+ #
+ # * Read a collection of data from some source, parsing the
+ # collection's elements.
+ #
+ # * Validate the elements.
+ #
+ # * Apply a user-defined function to map each element to some value
+ # and extract an element-specific key value.
+ #
+ # * Group elements with the same key into a single element with
+ # that key, transforming a multiply-keyed collection into a
+ # uniquely-keyed collection.
+ #
+ # * Write the elements out to some data sink.
+ #
+ # Note that the Cloud Dataflow service may be used to run many different
+ # types of jobs, not just Map-Reduce.
+ "kind": "A String", # The kind of step in the Cloud Dataflow job.
+ "properties": { # Named properties associated with the step. Each kind of
+ # predefined step has its own required set of properties.
+ # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "a_key": "", # Properties of the object.
+ },
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
+ },
+ ],
+ "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
+ # `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
+ "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
+ # isn't contained in the submitted job.
+ "stages": { # A mapping from each stage to the information about that stage.
+ "a_key": { # Contains information about how a particular
+ # google.dataflow.v1beta3.Step will be executed.
+ "stepName": [ # The steps associated with the execution stage.
+ # Note that stages may have several steps, and that a given step
+ # might be run by more than one stage.
+ "A String",
+ ],
+ },
},
},
- },
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
- "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
- # Cloud Dataflow service.
- "requestedState": "A String", # The job's requested state.
- #
- # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
- # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
- # also be used to directly set a job's requested state to
- # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
- # job if it has not already reached a terminal state.
- }</pre>
+ "type": "A String", # The type of Cloud Dataflow job.
+ "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
+ # Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
+ "requestedState": "A String", # The job's requested state.
+ #
+ # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
+ # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
+ # also be used to directly set a job's requested state to
+ # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
+ # job if it has not already reached a terminal state.
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
+ }</pre>
</div>
<div class="method">
- <code class="details" id="get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</code>
+ <code class="details" id="get">get(projectId, jobId, location=None, view=None, x__xgafv=None)</code>
<pre>Gets the state of the specified Cloud Dataflow job.
To get the state of a job, we recommend using `projects.locations.jobs.get`
@@ -1958,10 +1958,10 @@
Args:
projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
jobId: string, The job ID. (required)
- view: string, The level of information requested in response.
location: string, The [regional endpoint]
(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
contains this job.
+ view: string, The level of information requested in response.
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
@@ -1971,243 +1971,285 @@
An object of the form:
{ # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
- "a_key": "A String",
- },
- "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
- "workerRegion": "A String", # The Compute Engine region
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1". Mutually exclusive
- # with worker_zone. If neither worker_region nor worker_zone is specified,
- # default to the control plane's region.
- "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
- # at rest, AKA a Customer Managed Encryption Key (CMEK).
- #
- # Format:
- # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
- "workerZone": "A String", # The Compute Engine zone
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
- # with worker_region. If neither worker_region nor worker_zone is specified,
- # a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
- "workerPools": [ # The worker pools. At least one "harness" worker pool must be
- # specified in order for the job to have workers.
- { # Describes one particular pool of Cloud Dataflow workers to be
- # instantiated by the Cloud Dataflow service in order to perform the
- # computations required by a job. Note that a workflow job may use
- # multiple pools, in order to match the various computational
- # requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
- # will attempt to choose a reasonable default.
- "diskSourceImage": "A String", # Fully qualified source image for disks.
- "packages": [ # Packages to be installed on workers.
- { # The packages that must be installed in order for a worker to run the
- # steps of the Cloud Dataflow job that will be assigned to its worker
- # pool.
- #
- # This is the mechanism by which the Cloud Dataflow SDK causes code to
- # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
- # might use this to install jars containing the user's code and all of the
- # various dependencies (libraries, data files, etc.) required in order
- # for that code to run.
- "name": "A String", # The name of the package.
- "location": "A String", # The resource to read the package from. The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}
- # bucket.storage.googleapis.com/
- },
- ],
- "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
- # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
- # `TEARDOWN_NEVER`.
- # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
- # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
- # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
- # down.
- #
- # If the workers are not torn down by the service, they will
- # continue to run and use Google Compute Engine VM resources in the
- # user's project until they are explicitly terminated by the user.
- # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
- # policy except for small, manually supervised test jobs.
- #
- # If unknown or unspecified, the service will attempt to choose a reasonable
- # default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
- "poolArgs": { # Extra arguments for this worker pool.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
- # harness, residing in Google Container Registry.
- #
- # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
- "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
- # using the standard Dataflow task runner. Users should ignore
- # this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
- "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
- # access the Cloud Dataflow API.
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
- # console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
- "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage.
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
+ },
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
+ },
+ ],
+ },
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
+ "a_key": "A String",
+ },
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
+ "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
+ "workerRegion": "A String", # The Compute Engine region
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1". Mutually exclusive
+ # with worker_zone. If neither worker_region nor worker_zone is specified,
+ # default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
+ # at rest, AKA a Customer Managed Encryption Key (CMEK).
+ #
+ # Format:
+ # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
+ "workerZone": "A String", # The Compute Engine zone
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
+ # with worker_region. If neither worker_region nor worker_zone is specified,
+ # a zone in the control plane's region is chosen based on available capacity.
+ "workerPools": [ # The worker pools. At least one "harness" worker pool must be
+ # specified in order for the job to have workers.
+ { # Describes one particular pool of Cloud Dataflow workers to be
+ # instantiated by the Cloud Dataflow service in order to perform the
+ # computations required by a job. Note that a workflow job may use
+ # multiple pools, in order to match the various computational
+ # requirements of the various stages of the job.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
+ "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
+ # will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
+ "diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
+ "packages": [ # Packages to be installed on workers.
+ { # The packages that must be installed in order for a worker to run the
+ # steps of the Cloud Dataflow job that will be assigned to its worker
+ # pool.
#
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "reportingEnabled": True or False, # Whether to send work progress updates to the service.
- "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ # This is the mechanism by which the Cloud Dataflow SDK causes code to
+ # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
+ # might use this to install jars containing the user's code and all of the
+ # various dependencies (libraries, data files, etc.) required in order
+ # for that code to run.
+ "name": "A String", # The name of the package.
+ "location": "A String", # The resource to read the package from. The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}
+ # bucket.storage.googleapis.com/
+ },
+ ],
+ "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
+ # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
+ # `TEARDOWN_NEVER`.
+ # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
+ # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
+ # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
+ # down.
+ #
+ # If the workers are not torn down by the service, they will
+ # continue to run and use Google Compute Engine VM resources in the
+ # user's project until they are explicitly terminated by the user.
+ # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
+ # policy except for small, manually supervised test jobs.
+ #
+ # If unknown or unspecified, the service will attempt to choose a reasonable
+ # default.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
+ "poolArgs": { # Extra arguments for this worker pool.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
+ "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
+ # harness, residing in Google Container Registry.
+ #
+ # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
+ "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
+ # using the standard Dataflow task runner. Users should ignore
+ # this field.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
+ "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
+ # access the Cloud Dataflow API.
+ "A String",
+ ],
+ "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
# relative URLs. If this field is specified, it supplies the base
@@ -2216,339 +2258,297 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
- "workerId": "A String", # The ID of the worker running this pipeline.
+ "workflowFileName": "A String", # The file to store the workflow in.
+ "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
+ # console.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
+ "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
+ "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workerId": "A String", # The ID of the worker running this pipeline.
+ },
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
},
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "defaultPackageSet": "A String", # The default package set to install. This allows the service to
+ # select a default set of packages which are useful to worker
+ # harnesses written in a particular language.
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
- },
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
- "defaultPackageSet": "A String", # The default package set to install. This allows the service to
- # select a default set of packages which are useful to worker
- # harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
+ ],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
},
- ],
- "dataset": "A String", # The dataset for the current project where various workflow
- # related tables are stored.
- #
- # The supported resource type is:
- #
- # Google BigQuery:
- # bigquery.googleapis.com/{dataset}
- },
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
- },
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
- "steps": [ # Exactly one of step or steps_location should be specified.
- #
- # The top-level steps that constitute the entire job.
- { # Defines a particular step within a Cloud Dataflow job.
- #
- # A job consists of multiple steps, each of which performs some
- # specific operation as part of the overall job. Data is typically
- # passed from one step to another as part of the job.
- #
- # Here's an example of a sequence of steps which together implement a
- # Map-Reduce job:
- #
- # * Read a collection of data from some source, parsing the
- # collection's elements.
- #
- # * Validate the elements.
- #
- # * Apply a user-defined function to map each element to some value
- # and extract an element-specific key value.
- #
- # * Group elements with the same key into a single element with
- # that key, transforming a multiply-keyed collection into a
- # uniquely-keyed collection.
- #
- # * Write the elements out to some data sink.
- #
- # Note that the Cloud Dataflow service may be used to run many different
- # types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
- "kind": "A String", # The kind of step in the Cloud Dataflow job.
- "properties": { # Named properties associated with the step. Each kind of
- # predefined step has its own required set of properties.
- # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
"a_key": "", # Properties of the object.
},
+ "dataset": "A String", # The dataset for the current project where various workflow
+ # related tables are stored.
+ #
+ # The supported resource type is:
+ #
+ # Google BigQuery:
+ # bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
},
- ],
- "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
- # `JOB_STATE_UPDATED`), this field contains the ID of that job.
- "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
- # isn't contained in the submitted job.
- "stages": { # A mapping from each stage to the information about that stage.
- "a_key": { # Contains information about how a particular
- # google.dataflow.v1beta3.Step will be executed.
- "stepName": [ # The steps associated with the execution stage.
- # Note that stages may have several steps, and that a given step
- # might be run by more than one stage.
- "A String",
- ],
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
+ "steps": [ # Exactly one of step or steps_location should be specified.
+ #
+ # The top-level steps that constitute the entire job.
+ { # Defines a particular step within a Cloud Dataflow job.
+ #
+ # A job consists of multiple steps, each of which performs some
+ # specific operation as part of the overall job. Data is typically
+ # passed from one step to another as part of the job.
+ #
+ # Here's an example of a sequence of steps which together implement a
+ # Map-Reduce job:
+ #
+ # * Read a collection of data from some source, parsing the
+ # collection's elements.
+ #
+ # * Validate the elements.
+ #
+ # * Apply a user-defined function to map each element to some value
+ # and extract an element-specific key value.
+ #
+ # * Group elements with the same key into a single element with
+ # that key, transforming a multiply-keyed collection into a
+ # uniquely-keyed collection.
+ #
+ # * Write the elements out to some data sink.
+ #
+ # Note that the Cloud Dataflow service may be used to run many different
+ # types of jobs, not just Map-Reduce.
+ "kind": "A String", # The kind of step in the Cloud Dataflow job.
+ "properties": { # Named properties associated with the step. Each kind of
+ # predefined step has its own required set of properties.
+ # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "a_key": "", # Properties of the object.
+ },
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
+ },
+ ],
+ "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
+ # `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
+ "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
+ # isn't contained in the submitted job.
+ "stages": { # A mapping from each stage to the information about that stage.
+ "a_key": { # Contains information about how a particular
+ # google.dataflow.v1beta3.Step will be executed.
+ "stepName": [ # The steps associated with the execution stage.
+ # Note that stages may have several steps, and that a given step
+ # might be run by more than one stage.
+ "A String",
+ ],
+ },
},
},
- },
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
- "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
- # Cloud Dataflow service.
- "requestedState": "A String", # The job's requested state.
- #
- # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
- # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
- # also be used to directly set a job's requested state to
- # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
- # job if it has not already reached a terminal state.
- }</pre>
+ "type": "A String", # The type of Cloud Dataflow job.
+ "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
+ # Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
+ "requestedState": "A String", # The job's requested state.
+ #
+ # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
+ # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
+ # also be used to directly set a job's requested state to
+ # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
+ # job if it has not already reached a terminal state.
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
+ }</pre>
</div>
<div class="method">
@@ -2584,45 +2584,30 @@
# This resource captures only the most recent values of each metric;
# time-series data can be queried for them (under the same metric names)
# from Cloud Monitoring.
+ "metricTime": "A String", # Timestamp as of which metric values are current.
"metrics": [ # All metrics for this job.
{ # Describes the state of a metric.
- "set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only
- # possible value type is a list of Values whose type can be Long, Double,
- # or String, according to the metric's type. All Values in the list must
- # be of the same type.
- "gauge": "", # A struct value describing properties of a Gauge.
- # Metrics of gauge type show the value of a metric across time, and is
- # aggregated based on the newest value.
- "cumulative": True or False, # True if this metric is reported as the total cumulative aggregate
- # value accumulated since the worker started working on this WorkItem.
- # By default this is false, indicating that this metric is reported
- # as a delta that is not associated with any WorkItem.
- "internal": "", # Worker-computed aggregate value for internal use by the Dataflow
- # service.
+ "distribution": "", # A struct value describing properties of a distribution of numeric values.
"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are
# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".
# The specified aggregation kind is case-insensitive.
#
# If omitted, this is not an aggregated value but instead
# a single metric sample value.
- "scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",
- # "And", and "Or". The possible value types are Long, Double, and Boolean.
- "meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
- # This holds the count of the aggregated values and is used in combination
- # with mean_sum above to obtain the actual mean aggregate value.
- # The only possible value type is Long.
- "meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
- # This holds the sum of the aggregated values and is used in combination
- # with mean_count below to obtain the actual mean aggregate value.
- # The only possible value types are Long and Double.
+ "gauge": "", # A struct value describing properties of a Gauge.
+ # Metrics of gauge type show the value of a metric across time, and is
+ # aggregated based on the newest value.
"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are
# reporting work progress; it will be filled in responses from the
# metrics API.
+ "scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",
+ # "And", and "Or". The possible value types are Long, Double, and Boolean.
+ "cumulative": True or False, # True if this metric is reported as the total cumulative aggregate
+ # value accumulated since the worker started working on this WorkItem.
+ # By default this is false, indicating that this metric is reported
+ # as a delta that is not associated with any WorkItem.
"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.
# metric.
- "name": "A String", # Worker-defined metric name.
- "origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;
- # will be "dataflow" for metrics defined by the Dataflow service or SDK.
"context": { # Zero or more labeled fields which identify the part of the job this
# metric is associated with, such as the name of a step or collection.
#
@@ -2631,20 +2616,35 @@
# in the SDK will have context['pcollection'] = <pcollection-name>.
"a_key": "A String",
},
+ "name": "A String", # Worker-defined metric name.
+ "origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;
+ # will be "dataflow" for metrics defined by the Dataflow service or SDK.
},
- "distribution": "", # A struct value describing properties of a distribution of numeric values.
+ "meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
+ # This holds the count of the aggregated values and is used in combination
+ # with mean_sum above to obtain the actual mean aggregate value.
+ # The only possible value type is Long.
+ "meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
+ # This holds the sum of the aggregated values and is used in combination
+ # with mean_count below to obtain the actual mean aggregate value.
+ # The only possible value types are Long and Double.
+ "set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only
+ # possible value type is a list of Values whose type can be Long, Double,
+ # or String, according to the metric's type. All Values in the list must
+ # be of the same type.
+ "internal": "", # Worker-computed aggregate value for internal use by the Dataflow
+ # service.
},
],
- "metricTime": "A String", # Timestamp as of which metric values are current.
}</pre>
</div>
<div class="method">
- <code class="details" id="list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>
+ <code class="details" id="list">list(projectId, filter=None, pageSize=None, location=None, view=None, pageToken=None, x__xgafv=None)</code>
<pre>List the jobs of a project.
To list the jobs of a project in a region, we recommend using
-`projects.locations.jobs.get` with a [regional endpoint]
+`projects.locations.jobs.list` with a [regional endpoint]
(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
list the all jobs across all regions, use `projects.jobs.aggregated`. Using
`projects.jobs.list` is not recommended, as you can only get the list of
@@ -2653,15 +2653,15 @@
Args:
projectId: string, The project which owns the jobs. (required)
filter: string, The kind of filter to use.
- location: string, The [regional endpoint]
-(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
-contains this job.
- pageToken: string, Set this to the 'next_page_token' field of a previous response
-to request additional results in a long list.
pageSize: integer, If there are many jobs, limit response to at most this many.
The actual number of jobs returned will be the lesser of max_responses
and an unspecified server-defined limit.
+ location: string, The [regional endpoint]
+(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+contains this job.
view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
+ pageToken: string, Set this to the 'next_page_token' field of a previous response
+to request additional results in a long list.
x__xgafv: string, V1 error format.
Allowed values
1 - v1 error format
@@ -2677,243 +2677,285 @@
# body is empty {}.
"jobs": [ # A subset of the requested job information.
{ # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
- "a_key": "A String",
- },
- "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
- "workerRegion": "A String", # The Compute Engine region
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1". Mutually exclusive
- # with worker_zone. If neither worker_region nor worker_zone is specified,
- # default to the control plane's region.
- "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
- # at rest, AKA a Customer Managed Encryption Key (CMEK).
- #
- # Format:
- # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
- "workerZone": "A String", # The Compute Engine zone
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
- # with worker_region. If neither worker_region nor worker_zone is specified,
- # a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
- "workerPools": [ # The worker pools. At least one "harness" worker pool must be
- # specified in order for the job to have workers.
- { # Describes one particular pool of Cloud Dataflow workers to be
- # instantiated by the Cloud Dataflow service in order to perform the
- # computations required by a job. Note that a workflow job may use
- # multiple pools, in order to match the various computational
- # requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
- # will attempt to choose a reasonable default.
- "diskSourceImage": "A String", # Fully qualified source image for disks.
- "packages": [ # Packages to be installed on workers.
- { # The packages that must be installed in order for a worker to run the
- # steps of the Cloud Dataflow job that will be assigned to its worker
- # pool.
- #
- # This is the mechanism by which the Cloud Dataflow SDK causes code to
- # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
- # might use this to install jars containing the user's code and all of the
- # various dependencies (libraries, data files, etc.) required in order
- # for that code to run.
- "name": "A String", # The name of the package.
- "location": "A String", # The resource to read the package from. The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}
- # bucket.storage.googleapis.com/
- },
- ],
- "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
- # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
- # `TEARDOWN_NEVER`.
- # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
- # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
- # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
- # down.
- #
- # If the workers are not torn down by the service, they will
- # continue to run and use Google Compute Engine VM resources in the
- # user's project until they are explicitly terminated by the user.
- # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
- # policy except for small, manually supervised test jobs.
- #
- # If unknown or unspecified, the service will attempt to choose a reasonable
- # default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
- "poolArgs": { # Extra arguments for this worker pool.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
- # harness, residing in Google Container Registry.
- #
- # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
- "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
- # using the standard Dataflow task runner. Users should ignore
- # this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
- "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
- # access the Cloud Dataflow API.
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
- # console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
- "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage.
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
+ },
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
+ },
+ ],
+ },
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
+ "a_key": "A String",
+ },
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
+ "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
+ "workerRegion": "A String", # The Compute Engine region
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1". Mutually exclusive
+ # with worker_zone. If neither worker_region nor worker_zone is specified,
+ # default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
+ # at rest, AKA a Customer Managed Encryption Key (CMEK).
+ #
+ # Format:
+ # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
+ "workerZone": "A String", # The Compute Engine zone
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
+ # with worker_region. If neither worker_region nor worker_zone is specified,
+ # a zone in the control plane's region is chosen based on available capacity.
+ "workerPools": [ # The worker pools. At least one "harness" worker pool must be
+ # specified in order for the job to have workers.
+ { # Describes one particular pool of Cloud Dataflow workers to be
+ # instantiated by the Cloud Dataflow service in order to perform the
+ # computations required by a job. Note that a workflow job may use
+ # multiple pools, in order to match the various computational
+ # requirements of the various stages of the job.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
+ "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
+ # will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
+ "diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
+ "packages": [ # Packages to be installed on workers.
+ { # The packages that must be installed in order for a worker to run the
+ # steps of the Cloud Dataflow job that will be assigned to its worker
+ # pool.
#
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "reportingEnabled": True or False, # Whether to send work progress updates to the service.
- "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ # This is the mechanism by which the Cloud Dataflow SDK causes code to
+ # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
+ # might use this to install jars containing the user's code and all of the
+ # various dependencies (libraries, data files, etc.) required in order
+ # for that code to run.
+ "name": "A String", # The name of the package.
+ "location": "A String", # The resource to read the package from. The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}
+ # bucket.storage.googleapis.com/
+ },
+ ],
+ "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
+ # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
+ # `TEARDOWN_NEVER`.
+ # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
+ # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
+ # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
+ # down.
+ #
+ # If the workers are not torn down by the service, they will
+ # continue to run and use Google Compute Engine VM resources in the
+ # user's project until they are explicitly terminated by the user.
+ # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
+ # policy except for small, manually supervised test jobs.
+ #
+ # If unknown or unspecified, the service will attempt to choose a reasonable
+ # default.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
+ "poolArgs": { # Extra arguments for this worker pool.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
+ "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
+ # harness, residing in Google Container Registry.
+ #
+ # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
+ "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
+ # using the standard Dataflow task runner. Users should ignore
+ # this field.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
+ "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
+ # access the Cloud Dataflow API.
+ "A String",
+ ],
+ "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
# relative URLs. If this field is specified, it supplies the base
@@ -2922,340 +2964,299 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
- "workerId": "A String", # The ID of the worker running this pipeline.
+ "workflowFileName": "A String", # The file to store the workflow in.
+ "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
+ # console.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
+ "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
+ "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workerId": "A String", # The ID of the worker running this pipeline.
+ },
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
},
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "defaultPackageSet": "A String", # The default package set to install. This allows the service to
+ # select a default set of packages which are useful to worker
+ # harnesses written in a particular language.
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
- },
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
- "defaultPackageSet": "A String", # The default package set to install. This allows the service to
- # select a default set of packages which are useful to worker
- # harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
+ ],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
},
- ],
- "dataset": "A String", # The dataset for the current project where various workflow
- # related tables are stored.
- #
- # The supported resource type is:
- #
- # Google BigQuery:
- # bigquery.googleapis.com/{dataset}
- },
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
- },
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
- "steps": [ # Exactly one of step or steps_location should be specified.
- #
- # The top-level steps that constitute the entire job.
- { # Defines a particular step within a Cloud Dataflow job.
- #
- # A job consists of multiple steps, each of which performs some
- # specific operation as part of the overall job. Data is typically
- # passed from one step to another as part of the job.
- #
- # Here's an example of a sequence of steps which together implement a
- # Map-Reduce job:
- #
- # * Read a collection of data from some source, parsing the
- # collection's elements.
- #
- # * Validate the elements.
- #
- # * Apply a user-defined function to map each element to some value
- # and extract an element-specific key value.
- #
- # * Group elements with the same key into a single element with
- # that key, transforming a multiply-keyed collection into a
- # uniquely-keyed collection.
- #
- # * Write the elements out to some data sink.
- #
- # Note that the Cloud Dataflow service may be used to run many different
- # types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
- "kind": "A String", # The kind of step in the Cloud Dataflow job.
- "properties": { # Named properties associated with the step. Each kind of
- # predefined step has its own required set of properties.
- # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
"a_key": "", # Properties of the object.
},
+ "dataset": "A String", # The dataset for the current project where various workflow
+ # related tables are stored.
+ #
+ # The supported resource type is:
+ #
+ # Google BigQuery:
+ # bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
},
- ],
- "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
- # `JOB_STATE_UPDATED`), this field contains the ID of that job.
- "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
- # isn't contained in the submitted job.
- "stages": { # A mapping from each stage to the information about that stage.
- "a_key": { # Contains information about how a particular
- # google.dataflow.v1beta3.Step will be executed.
- "stepName": [ # The steps associated with the execution stage.
- # Note that stages may have several steps, and that a given step
- # might be run by more than one stage.
- "A String",
- ],
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
+ "steps": [ # Exactly one of step or steps_location should be specified.
+ #
+ # The top-level steps that constitute the entire job.
+ { # Defines a particular step within a Cloud Dataflow job.
+ #
+ # A job consists of multiple steps, each of which performs some
+ # specific operation as part of the overall job. Data is typically
+ # passed from one step to another as part of the job.
+ #
+ # Here's an example of a sequence of steps which together implement a
+ # Map-Reduce job:
+ #
+ # * Read a collection of data from some source, parsing the
+ # collection's elements.
+ #
+ # * Validate the elements.
+ #
+ # * Apply a user-defined function to map each element to some value
+ # and extract an element-specific key value.
+ #
+ # * Group elements with the same key into a single element with
+ # that key, transforming a multiply-keyed collection into a
+ # uniquely-keyed collection.
+ #
+ # * Write the elements out to some data sink.
+ #
+ # Note that the Cloud Dataflow service may be used to run many different
+ # types of jobs, not just Map-Reduce.
+ "kind": "A String", # The kind of step in the Cloud Dataflow job.
+ "properties": { # Named properties associated with the step. Each kind of
+ # predefined step has its own required set of properties.
+ # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "a_key": "", # Properties of the object.
+ },
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
+ },
+ ],
+ "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
+ # `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
+ "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
+ # isn't contained in the submitted job.
+ "stages": { # A mapping from each stage to the information about that stage.
+ "a_key": { # Contains information about how a particular
+ # google.dataflow.v1beta3.Step will be executed.
+ "stepName": [ # The steps associated with the execution stage.
+ # Note that stages may have several steps, and that a given step
+ # might be run by more than one stage.
+ "A String",
+ ],
+ },
},
},
+ "type": "A String", # The type of Cloud Dataflow job.
+ "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
+ # Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
+ "requestedState": "A String", # The job's requested state.
+ #
+ # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
+ # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
+ # also be used to directly set a job's requested state to
+ # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
+ # job if it has not already reached a terminal state.
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
},
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
- "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
- # Cloud Dataflow service.
- "requestedState": "A String", # The job's requested state.
- #
- # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
- # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
- # also be used to directly set a job's requested state to
- # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
- # job if it has not already reached a terminal state.
- },
],
+ "nextPageToken": "A String", # Set if there may be more results than fit in this response.
"failedLocation": [ # Zero or more messages describing the [regional endpoints]
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
# failed to respond.
@@ -3267,7 +3268,6 @@
# failed to respond.
},
],
- "nextPageToken": "A String", # Set if there may be more results than fit in this response.
}</pre>
</div>
@@ -3296,10 +3296,10 @@
The object takes the form of:
{ # Request to create a snapshot of a job.
- "description": "A String", # User specified description of the snapshot. Maybe empty.
"snapshotSources": True or False, # If true, perform snapshots for sources which support this.
- "ttl": "A String", # TTL for the snapshot.
"location": "A String", # The location that contains this job.
+ "description": "A String", # User specified description of the snapshot. Maybe empty.
+ "ttl": "A String", # TTL for the snapshot.
}
x__xgafv: string, V1 error format.
@@ -3311,20 +3311,20 @@
An object of the form:
{ # Represents a snapshot of a job.
+ "ttl": "A String", # The time after which this snapshot will be automatically deleted.
+ "state": "A String", # State of the snapshot.
+ "id": "A String", # The unique ID of this snapshot.
+ "sourceJobId": "A String", # The job this snapshot was created from.
+ "creationTime": "A String", # The time this snapshot was created.
+ "description": "A String", # User specified description of the snapshot. Maybe empty.
"pubsubMetadata": [ # PubSub snapshot metadata.
{ # Represents a Pubsub snapshot.
"snapshotName": "A String", # The name of the Pubsub snapshot.
- "topicName": "A String", # The name of the Pubsub topic.
"expireTime": "A String", # The expire time of the Pubsub snapshot.
+ "topicName": "A String", # The name of the Pubsub topic.
},
],
- "creationTime": "A String", # The time this snapshot was created.
- "sourceJobId": "A String", # The job this snapshot was created from.
- "state": "A String", # State of the snapshot.
"projectId": "A String", # The project this snapshot belongs to.
- "ttl": "A String", # The time after which this snapshot will be automatically deleted.
- "id": "A String", # The unique ID of this snapshot.
- "description": "A String", # User specified description of the snapshot. Maybe empty.
"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY
# state.
}</pre>
@@ -3347,667 +3347,162 @@
The object takes the form of:
{ # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
- "a_key": "A String",
- },
- "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
- "workerRegion": "A String", # The Compute Engine region
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1". Mutually exclusive
- # with worker_zone. If neither worker_region nor worker_zone is specified,
- # default to the control plane's region.
- "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
- # at rest, AKA a Customer Managed Encryption Key (CMEK).
- #
- # Format:
- # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
- "workerZone": "A String", # The Compute Engine zone
- # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
- # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
- # with worker_region. If neither worker_region nor worker_zone is specified,
- # a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
- "workerPools": [ # The worker pools. At least one "harness" worker pool must be
- # specified in order for the job to have workers.
- { # Describes one particular pool of Cloud Dataflow workers to be
- # instantiated by the Cloud Dataflow service in order to perform the
- # computations required by a job. Note that a workflow job may use
- # multiple pools, in order to match the various computational
- # requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
- # will attempt to choose a reasonable default.
- "diskSourceImage": "A String", # Fully qualified source image for disks.
- "packages": [ # Packages to be installed on workers.
- { # The packages that must be installed in order for a worker to run the
- # steps of the Cloud Dataflow job that will be assigned to its worker
- # pool.
- #
- # This is the mechanism by which the Cloud Dataflow SDK causes code to
- # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
- # might use this to install jars containing the user's code and all of the
- # various dependencies (libraries, data files, etc.) required in order
- # for that code to run.
- "name": "A String", # The name of the package.
- "location": "A String", # The resource to read the package from. The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}
- # bucket.storage.googleapis.com/
- },
- ],
- "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
- # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
- # `TEARDOWN_NEVER`.
- # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
- # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
- # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
- # down.
- #
- # If the workers are not torn down by the service, they will
- # continue to run and use Google Compute Engine VM resources in the
- # user's project until they are explicitly terminated by the user.
- # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
- # policy except for small, manually supervised test jobs.
- #
- # If unknown or unspecified, the service will attempt to choose a reasonable
- # default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
- "poolArgs": { # Extra arguments for this worker pool.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
- # harness, residing in Google Container Registry.
- #
- # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
- "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
- # using the standard Dataflow task runner. Users should ignore
- # this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
- "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
- # access the Cloud Dataflow API.
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
- # console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
- "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "reportingEnabled": True or False, # Whether to send work progress updates to the service.
- "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
- #
- # When workers access Google Cloud APIs, they logically do so via
- # relative URLs. If this field is specified, it supplies the base
- # URL to use for resolving these relative URLs. The normative
- # algorithm used is defined by RFC 1808, "Relative Uniform Resource
- # Locators".
- #
- # If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
- "workerId": "A String", # The ID of the worker running this pipeline.
- },
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
},
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
- "defaultPackageSet": "A String", # The default package set to install. This allows the service to
- # select a default set of packages which are useful to worker
- # harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
- },
- ],
- "dataset": "A String", # The dataset for the current project where various workflow
- # related tables are stored.
- #
- # The supported resource type is:
- #
- # Google BigQuery:
- # bigquery.googleapis.com/{dataset}
- },
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
+ ],
},
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
- "steps": [ # Exactly one of step or steps_location should be specified.
- #
- # The top-level steps that constitute the entire job.
- { # Defines a particular step within a Cloud Dataflow job.
- #
- # A job consists of multiple steps, each of which performs some
- # specific operation as part of the overall job. Data is typically
- # passed from one step to another as part of the job.
- #
- # Here's an example of a sequence of steps which together implement a
- # Map-Reduce job:
- #
- # * Read a collection of data from some source, parsing the
- # collection's elements.
- #
- # * Validate the elements.
- #
- # * Apply a user-defined function to map each element to some value
- # and extract an element-specific key value.
- #
- # * Group elements with the same key into a single element with
- # that key, transforming a multiply-keyed collection into a
- # uniquely-keyed collection.
- #
- # * Write the elements out to some data sink.
- #
- # Note that the Cloud Dataflow service may be used to run many different
- # types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
- "kind": "A String", # The kind of step in the Cloud Dataflow job.
- "properties": { # Named properties associated with the step. Each kind of
- # predefined step has its own required set of properties.
- # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
- "a_key": "", # Properties of the object.
- },
- },
- ],
- "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
- # `JOB_STATE_UPDATED`), this field contains the ID of that job.
- "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
- # isn't contained in the submitted job.
- "stages": { # A mapping from each stage to the information about that stage.
- "a_key": { # Contains information about how a particular
- # google.dataflow.v1beta3.Step will be executed.
- "stepName": [ # The steps associated with the execution stage.
- # Note that stages may have several steps, and that a given step
- # might be run by more than one stage.
- "A String",
- ],
- },
- },
- },
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
- "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
- # Cloud Dataflow service.
- "requestedState": "A String", # The job's requested state.
- #
- # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
- # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
- # also be used to directly set a job's requested state to
- # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
- # job if it has not already reached a terminal state.
-}
-
- location: string, The [regional endpoint]
-(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
-contains this job.
- x__xgafv: string, V1 error format.
- Allowed values
- 1 - v1 error format
- 2 - v2 error format
-
-Returns:
- An object of the form:
-
- { # Defines a job to be run by the Cloud Dataflow service.
- "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
- # If this field is set, the service will ensure its uniqueness.
- # The request to create a job will fail if the service has knowledge of a
- # previously submitted job with the same client's ID and job name.
- # The caller may use this field to ensure idempotence of job
- # creation across retried attempts to create a job.
- # By default, the field is empty and, in that case, the service ignores it.
- "id": "A String", # The unique ID of this job.
- #
- # This field is set by the Cloud Dataflow service when the Job is
- # created, and is immutable for the life of the job.
- "currentStateTime": "A String", # The timestamp associated with the current state.
- "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
- # corresponding name prefixes of the new job.
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
"a_key": "A String",
},
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
- "internalExperiments": { # Experimental settings.
- "a_key": "", # Properties of the object. Contains field @type with type URL.
- },
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
"workerRegion": "A String", # The Compute Engine region
# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
# which worker processing should occur, e.g. "us-west1". Mutually exclusive
# with worker_zone. If neither worker_region nor worker_zone is specified,
# default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
# at rest, AKA a Customer Managed Encryption Key (CMEK).
#
# Format:
# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
- "userAgent": { # A description of the process that generated the request.
- "a_key": "", # Properties of the object.
- },
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
"workerZone": "A String", # The Compute Engine zone
# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
# with worker_region. If neither worker_region nor worker_zone is specified,
# a zone in the control plane's region is chosen based on available capacity.
- "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
- # unspecified, the service will attempt to choose a reasonable
- # default. This should be in the form of the API service name,
- # e.g. "compute.googleapis.com".
- "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
- # storage. The system will append the suffix "/temp-{JOBNAME} to
- # this resource prefix, where {JOBNAME} is the value of the
- # job_name field. The resulting bucket and object prefix is used
- # as the prefix of the resources used to store temporary data
- # needed during the job execution. NOTE: This will override the
- # value in taskrunner_settings.
- # The supported resource type is:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "experiments": [ # The list of experiments to enable.
- "A String",
- ],
- "version": { # A structure describing which components and their versions of the service
- # are required in order to run the job.
- "a_key": "", # Properties of the object.
- },
- "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
- "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
- # options are passed through the service and are used to recreate the
- # SDK pipeline options on the worker in a language agnostic and platform
- # independent way.
- "a_key": "", # Properties of the object.
- },
- "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
"workerPools": [ # The worker pools. At least one "harness" worker pool must be
# specified in order for the job to have workers.
{ # Describes one particular pool of Cloud Dataflow workers to be
@@ -4015,15 +3510,54 @@
# computations required by a job. Note that a workflow job may use
# multiple pools, in order to match the various computational
# requirements of the various stages of the job.
- "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
- # service will choose a number of threads (according to the number of cores
- # on the selected machine type for batch, or 1 by convention for streaming).
- "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
- # execute the job. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
# will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
"diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
"packages": [ # Packages to be installed on workers.
{ # The packages that must be installed in order for a worker to run the
# steps of the Cloud Dataflow job that will be assigned to its worker
@@ -4059,98 +3593,38 @@
#
# If unknown or unspecified, the service will attempt to choose a reasonable
# default.
- "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
- # Compute Engine API.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
"poolArgs": { # Extra arguments for this worker pool.
"a_key": "", # Properties of the object. Contains field @type with type URL.
},
- "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
# attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
# harness, residing in Google Container Registry.
#
# Deprecated for the Fn API path. Use sdk_harness_container_images instead.
- "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
- # attempt to choose a reasonable default.
- "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
- # service will attempt to choose a reasonable default.
- "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
- # are supported.
- "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
- # only be set in the Fn API path. For non-cross-language pipelines this
- # should have only one entry. Cross-language pipelines will have two or more
- # entries.
- { # Defines a SDK harness container for executing Dataflow pipelines.
- "containerImage": "A String", # A docker container image that resides in Google Container Registry.
- "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
- # container instance with this image. If false (or unset) recommends using
- # more than one core per SDK container instance with this image for
- # efficiency. Note that Dataflow service may choose to override this property
- # if needed.
- },
- ],
- "dataDisks": [ # Data disks that are used by a VM in this workflow.
- { # Describes the data disk used by a workflow job.
- "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
- # must be a disk type appropriate to the project and zone in which
- # the workers will run. If unknown or unspecified, the service
- # will attempt to choose a reasonable default.
- #
- # For example, the standard persistent disk type is a resource name
- # typically ending in "pd-standard". If SSD persistent disks are
- # available, the resource name typically ends with "pd-ssd". The
- # actual valid values are defined the Google Compute Engine API,
- # not by the Cloud Dataflow API; consult the Google Compute Engine
- # documentation for more information about determining the set of
- # available disk types for a particular project and zone.
- #
- # Google Compute Engine Disk types are local to a particular
- # project in a particular zone, and so the resource name will
- # typically look something like this:
- #
- # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
- "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
- # attempt to choose a reasonable default.
- "mountPoint": "A String", # Directory in a VM where disk is mounted.
- },
- ],
- "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
- # the form "regions/REGION/subnetworks/SUBNETWORK".
- "ipConfiguration": "A String", # Configuration for VM IPs.
"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
# using the standard Dataflow task runner. Users should ignore
# this field.
- "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
- "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "wheel".
- "harnessCommand": "A String", # The command to launch the worker harness.
- "logDir": "A String", # The directory on the VM to store logs.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
# access the Cloud Dataflow API.
"A String",
],
- "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
- "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
- # will not be uploaded.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "streamingWorkerMainClass": "A String", # The streaming worker main class name.
- "workflowFileName": "A String", # The file to store the workflow in.
- "languageHint": "A String", # The suggested backend language.
- "commandlinesFileName": "A String", # The file to store preprocessing commands in.
- "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
- "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
- # temporary storage.
- #
- # The supported resource type is:
- #
- # Google Cloud Storage:
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
@@ -4160,10 +3634,17 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
+ "workflowFileName": "A String", # The file to store the workflow in.
"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
# console.
- "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
# storage.
#
@@ -4174,6 +3655,8 @@
# storage.googleapis.com/{bucket}/{object}
# bucket.storage.googleapis.com/{object}
"reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
#
# When workers access Google Cloud APIs, they logically do so via
@@ -4183,30 +3666,64 @@
# Locators".
#
# If not specified, the default value is "http://www.googleapis.com/"
- "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
- # "dataflow/v1b3/projects".
- "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
- # "shuffle/v1beta1".
"workerId": "A String", # The ID of the worker running this pipeline.
},
- "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
- # taskrunner; e.g. "root".
- "vmId": "A String", # The ID string of the VM.
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
},
- "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
- "algorithm": "A String", # The algorithm to use for autoscaling.
- "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
- },
- "metadata": { # Metadata to set on the Google Compute Engine VMs.
- "a_key": "A String",
- },
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
"defaultPackageSet": "A String", # The default package set to install. This allows the service to
# select a default set of packages which are useful to worker
# harnesses written in a particular language.
- "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
- # the service will use the network "default".
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
},
],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
+ "a_key": "", # Properties of the object.
+ },
"dataset": "A String", # The dataset for the current project where various workflow
# related tables are stored.
#
@@ -4214,215 +3731,14 @@
#
# Google BigQuery:
# bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
},
- "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- { # A message describing the state of a particular execution stage.
- "currentStateTime": "A String", # The time at which the stage transitioned to this state.
- "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
- "executionStageName": "A String", # The name of the execution stage.
- },
- ],
- "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
- # by the metadata values provided here. Populated for ListJobs and all GetJob
- # views SUMMARY and higher.
- # ListJob response and Job SUMMARY view.
- "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
- { # Metadata for a Datastore connector used by the job.
- "namespace": "A String", # Namespace used in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- },
- ],
- "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
- "version": "A String", # The version of the SDK used to run the job.
- "sdkSupportStatus": "A String", # The support status for this SDK version.
- "versionDisplayName": "A String", # A readable string describing the version of the SDK.
- },
- "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
- { # Metadata for a BigQuery connector used by the job.
- "table": "A String", # Table accessed in the connection.
- "dataset": "A String", # Dataset accessed in the connection.
- "query": "A String", # Query used to access data in the connection.
- "projectId": "A String", # Project accessed in the connection.
- },
- ],
- "fileDetails": [ # Identification of a File source used in the Dataflow job.
- { # Metadata for a File connector used by the job.
- "filePattern": "A String", # File Pattern used to access files by the connector.
- },
- ],
- "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
- { # Metadata for a PubSub connector used by the job.
- "topic": "A String", # Topic accessed in the connection.
- "subscription": "A String", # Subscription used in the connection.
- },
- ],
- "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
- { # Metadata for a BigTable connector used by the job.
- "projectId": "A String", # ProjectId accessed in the connection.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "tableId": "A String", # TableId accessed in the connection.
- },
- ],
- "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
- { # Metadata for a Spanner connector used by the job.
- "instanceId": "A String", # InstanceId accessed in the connection.
- "projectId": "A String", # ProjectId accessed in the connection.
- "databaseId": "A String", # DatabaseId accessed in the connection.
- },
- ],
- },
- "type": "A String", # The type of Cloud Dataflow job.
- "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
- "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
- # snapshot.
- "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
- # A description of the user pipeline and stages through which it is executed.
- # Created by Cloud Dataflow service. Only retrieved with
- # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
- # form. This data is provided by the Dataflow service for ease of visualizing
- # the pipeline and interpreting Dataflow provided metrics.
- "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
- { # Description of the composing transforms, names/ids, and input/outputs of a
- # stage of execution. Some composing transforms and sources may have been
- # generated by the Dataflow service during execution planning.
- "outputSource": [ # Output sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "name": "A String", # Dataflow service generated name for this stage.
- "inputSource": [ # Input sources for this stage.
- { # Description of an input or output of an execution stage.
- "sizeBytes": "A String", # Size of the source, if measurable.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this source; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "id": "A String", # Dataflow service generated id for this stage.
- "componentTransform": [ # Transforms that comprise this execution stage.
- { # Description of a transform executed as part of an execution stage.
- "originalTransform": "A String", # User name for the original user transform with which this transform is
- # most closely associated.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- },
- ],
- "componentSource": [ # Collections produced and consumed by component transforms of this stage.
- { # Description of an interstitial value between transforms in an execution
- # stage.
- "name": "A String", # Dataflow service generated name for this source.
- "userName": "A String", # Human-readable name for this transform; may be user or system generated.
- "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
- # source is most closely associated.
- },
- ],
- "kind": "A String", # Type of tranform this stage is executing.
- },
- ],
- "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
- { # Description of the type, names/ids, and input/outputs for a transform.
- "kind": "A String", # Type of transform.
- "inputCollectionName": [ # User names for all collection inputs to this transform.
- "A String",
- ],
- "name": "A String", # User provided name for this transform instance.
- "id": "A String", # SDK generated id of this transform instance.
- "displayData": [ # Transform-specific display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- "outputCollectionName": [ # User names for all collection outputs to this transform.
- "A String",
- ],
- },
- ],
- "displayData": [ # Pipeline level display data.
- { # Data provided with a pipeline or transform to provide descriptive info.
- "durationValue": "A String", # Contains value if the data is of duration type.
- "int64Value": "A String", # Contains value if the data is of int64 type.
- "namespace": "A String", # The namespace for the key. This is usually a class name or programming
- # language namespace (i.e. python module) which defines the display data.
- # This allows a dax monitoring system to specially handle the data
- # and perform custom rendering.
- "floatValue": 3.14, # Contains value if the data is of float type.
- "key": "A String", # The key identifying the display data.
- # This is intended to be used as a label for the display data
- # when viewed in a dax monitoring system.
- "shortStrValue": "A String", # A possible additional shorter value to display.
- # For example a java_class_name_value of com.mypackage.MyDoFn
- # will be stored with MyDoFn as the short_str_value and
- # com.mypackage.MyDoFn as the java_class_name value.
- # short_str_value can be displayed and java_class_name_value
- # will be displayed as a tooltip.
- "url": "A String", # An optional full URL.
- "label": "A String", # An optional label to display in a dax UI for the element.
- "timestampValue": "A String", # Contains value if the data is of timestamp type.
- "boolValue": True or False, # Contains value if the data is of a boolean type.
- "javaClassValue": "A String", # Contains value if the data is of java class type.
- "strValue": "A String", # Contains value if the data is of string type.
- },
- ],
- },
- "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
- # of the job it replaced.
- #
- # When sending a `CreateJobRequest`, you can update a job by specifying it
- # here. The job named here is stopped, and its intermediate state is
- # transferred to this job.
- "tempFiles": [ # A set of files the system should be aware of that are used
- # for temporary storage. These temporary files will be
- # removed on job completion.
- # No duplicates are allowed.
- # No file patterns are supported.
- #
- # The supported files are:
- #
- # Google Cloud Storage:
- #
- # storage.googleapis.com/{bucket}/{object}
- # bucket.storage.googleapis.com/{object}
- "A String",
- ],
- "name": "A String", # The user-specified Cloud Dataflow job name.
- #
- # Only one Job with a given name may exist in a project at any
- # given time. If a caller attempts to create a Job with the same
- # name as an already-existing Job, the attempt returns the
- # existing Job.
- #
- # The name must match the regular expression
- # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
"steps": [ # Exactly one of step or steps_location should be specified.
- #
+ #
# The top-level steps that constitute the entire job.
{ # Defines a particular step within a Cloud Dataflow job.
#
@@ -4449,18 +3765,95 @@
#
# Note that the Cloud Dataflow service may be used to run many different
# types of jobs, not just Map-Reduce.
- "name": "A String", # The name that identifies the step. This must be unique for each
- # step with respect to all other steps in the Cloud Dataflow job.
"kind": "A String", # The kind of step in the Cloud Dataflow job.
"properties": { # Named properties associated with the step. Each kind of
# predefined step has its own required set of properties.
# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
"a_key": "", # Properties of the object.
},
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
},
],
"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
# `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
# isn't contained in the submitted job.
"stages": { # A mapping from each stage to the information about that stage.
@@ -4474,48 +3867,655 @@
},
},
},
- "currentState": "A String", # The current state of the job.
- #
- # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
- # specified.
- #
- # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
- # terminal state. After a job has reached a terminal state, no
- # further state updates may be made.
- #
- # This field may be mutated by the Cloud Dataflow service;
- # callers cannot mutate it.
- "location": "A String", # The [regional endpoint]
- # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
- # contains this job.
- "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
- # Flexible resource scheduling jobs are started with some delay after job
- # creation, so start_time is unset before start and is updated when the
- # job is started by the Cloud Dataflow service. For other jobs, start_time
- # always equals to create_time and is immutable and set by the Cloud Dataflow
- # service.
- "stepsLocation": "A String", # The GCS location where the steps are stored.
- "labels": { # User-defined labels for this job.
- #
- # The labels map can contain no more than 64 entries. Entries of the labels
- # map are UTF8 strings that comply with the following restrictions:
- #
- # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
- # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
- # * Both keys and values are additionally constrained to be <= 128 bytes in
- # size.
- "a_key": "A String",
- },
+ "type": "A String", # The type of Cloud Dataflow job.
"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
# Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
"requestedState": "A String", # The job's requested state.
- #
+ #
# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
# also be used to directly set a job's requested state to
# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
# job if it has not already reached a terminal state.
- }</pre>
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
+ }
+
+ location: string, The [regional endpoint]
+(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+contains this job.
+ x__xgafv: string, V1 error format.
+ Allowed values
+ 1 - v1 error format
+ 2 - v2 error format
+
+Returns:
+ An object of the form:
+
+ { # Defines a job to be run by the Cloud Dataflow service.
+ "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
+ # A description of the user pipeline and stages through which it is executed.
+ # Created by Cloud Dataflow service. Only retrieved with
+ # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
+ # form. This data is provided by the Dataflow service for ease of visualizing
+ # the pipeline and interpreting Dataflow provided metrics.
+ "displayData": [ # Pipeline level display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
+ { # Description of the type, names/ids, and input/outputs for a transform.
+ "outputCollectionName": [ # User names for all collection outputs to this transform.
+ "A String",
+ ],
+ "displayData": [ # Transform-specific display data.
+ { # Data provided with a pipeline or transform to provide descriptive info.
+ "url": "A String", # An optional full URL.
+ "javaClassValue": "A String", # Contains value if the data is of java class type.
+ "timestampValue": "A String", # Contains value if the data is of timestamp type.
+ "durationValue": "A String", # Contains value if the data is of duration type.
+ "label": "A String", # An optional label to display in a dax UI for the element.
+ "key": "A String", # The key identifying the display data.
+ # This is intended to be used as a label for the display data
+ # when viewed in a dax monitoring system.
+ "namespace": "A String", # The namespace for the key. This is usually a class name or programming
+ # language namespace (i.e. python module) which defines the display data.
+ # This allows a dax monitoring system to specially handle the data
+ # and perform custom rendering.
+ "floatValue": 3.14, # Contains value if the data is of float type.
+ "strValue": "A String", # Contains value if the data is of string type.
+ "int64Value": "A String", # Contains value if the data is of int64 type.
+ "boolValue": True or False, # Contains value if the data is of a boolean type.
+ "shortStrValue": "A String", # A possible additional shorter value to display.
+ # For example a java_class_name_value of com.mypackage.MyDoFn
+ # will be stored with MyDoFn as the short_str_value and
+ # com.mypackage.MyDoFn as the java_class_name value.
+ # short_str_value can be displayed and java_class_name_value
+ # will be displayed as a tooltip.
+ },
+ ],
+ "id": "A String", # SDK generated id of this transform instance.
+ "inputCollectionName": [ # User names for all collection inputs to this transform.
+ "A String",
+ ],
+ "name": "A String", # User provided name for this transform instance.
+ "kind": "A String", # Type of transform.
+ },
+ ],
+ "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
+ { # Description of the composing transforms, names/ids, and input/outputs of a
+ # stage of execution. Some composing transforms and sources may have been
+ # generated by the Dataflow service during execution planning.
+ "componentSource": [ # Collections produced and consumed by component transforms of this stage.
+ { # Description of an interstitial value between transforms in an execution
+ # stage.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "name": "A String", # Dataflow service generated name for this source.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ },
+ ],
+ "inputSource": [ # Input sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "name": "A String", # Dataflow service generated name for this stage.
+ "componentTransform": [ # Transforms that comprise this execution stage.
+ { # Description of a transform executed as part of an execution stage.
+ "name": "A String", # Dataflow service generated name for this source.
+ "userName": "A String", # Human-readable name for this transform; may be user or system generated.
+ "originalTransform": "A String", # User name for the original user transform with which this transform is
+ # most closely associated.
+ },
+ ],
+ "id": "A String", # Dataflow service generated id for this stage.
+ "outputSource": [ # Output sources for this stage.
+ { # Description of an input or output of an execution stage.
+ "userName": "A String", # Human-readable name for this source; may be user or system generated.
+ "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
+ # source is most closely associated.
+ "sizeBytes": "A String", # Size of the source, if measurable.
+ "name": "A String", # Dataflow service generated name for this source.
+ },
+ ],
+ "kind": "A String", # Type of tranform this stage is executing.
+ },
+ ],
+ },
+ "labels": { # User-defined labels for this job.
+ #
+ # The labels map can contain no more than 64 entries. Entries of the labels
+ # map are UTF8 strings that comply with the following restrictions:
+ #
+ # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
+ # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
+ # * Both keys and values are additionally constrained to be <= 128 bytes in
+ # size.
+ "a_key": "A String",
+ },
+ "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
+ "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
+ "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
+ "workerRegion": "A String", # The Compute Engine region
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1". Mutually exclusive
+ # with worker_zone. If neither worker_region nor worker_zone is specified,
+ # default to the control plane's region.
+ "userAgent": { # A description of the process that generated the request.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
+ "version": { # A structure describing which components and their versions of the service
+ # are required in order to run the job.
+ "a_key": "", # Properties of the object.
+ },
+ "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
+ # at rest, AKA a Customer Managed Encryption Key (CMEK).
+ #
+ # Format:
+ # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
+ "experiments": [ # The list of experiments to enable.
+ "A String",
+ ],
+ "workerZone": "A String", # The Compute Engine zone
+ # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
+ # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
+ # with worker_region. If neither worker_region nor worker_zone is specified,
+ # a zone in the control plane's region is chosen based on available capacity.
+ "workerPools": [ # The worker pools. At least one "harness" worker pool must be
+ # specified in order for the job to have workers.
+ { # Describes one particular pool of Cloud Dataflow workers to be
+ # instantiated by the Cloud Dataflow service in order to perform the
+ # computations required by a job. Note that a workflow job may use
+ # multiple pools, in order to match the various computational
+ # requirements of the various stages of the job.
+ "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
+ # Compute Engine API.
+ "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
+ # only be set in the Fn API path. For non-cross-language pipelines this
+ # should have only one entry. Cross-language pipelines will have two or more
+ # entries.
+ { # Defines a SDK harness container for executing Dataflow pipelines.
+ "containerImage": "A String", # A docker container image that resides in Google Container Registry.
+ "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
+ # container instance with this image. If false (or unset) recommends using
+ # more than one core per SDK container instance with this image for
+ # efficiency. Note that Dataflow service may choose to override this property
+ # if needed.
+ },
+ ],
+ "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
+ # will attempt to choose a reasonable default.
+ "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
+ # are supported.
+ "metadata": { # Metadata to set on the Google Compute Engine VMs.
+ "a_key": "A String",
+ },
+ "diskSourceImage": "A String", # Fully qualified source image for disks.
+ "dataDisks": [ # Data disks that are used by a VM in this workflow.
+ { # Describes the data disk used by a workflow job.
+ "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
+ # must be a disk type appropriate to the project and zone in which
+ # the workers will run. If unknown or unspecified, the service
+ # will attempt to choose a reasonable default.
+ #
+ # For example, the standard persistent disk type is a resource name
+ # typically ending in "pd-standard". If SSD persistent disks are
+ # available, the resource name typically ends with "pd-ssd". The
+ # actual valid values are defined the Google Compute Engine API,
+ # not by the Cloud Dataflow API; consult the Google Compute Engine
+ # documentation for more information about determining the set of
+ # available disk types for a particular project and zone.
+ #
+ # Google Compute Engine Disk types are local to a particular
+ # project in a particular zone, and so the resource name will
+ # typically look something like this:
+ #
+ # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
+ "mountPoint": "A String", # Directory in a VM where disk is mounted.
+ },
+ ],
+ "packages": [ # Packages to be installed on workers.
+ { # The packages that must be installed in order for a worker to run the
+ # steps of the Cloud Dataflow job that will be assigned to its worker
+ # pool.
+ #
+ # This is the mechanism by which the Cloud Dataflow SDK causes code to
+ # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
+ # might use this to install jars containing the user's code and all of the
+ # various dependencies (libraries, data files, etc.) required in order
+ # for that code to run.
+ "name": "A String", # The name of the package.
+ "location": "A String", # The resource to read the package from. The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}
+ # bucket.storage.googleapis.com/
+ },
+ ],
+ "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
+ # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
+ # `TEARDOWN_NEVER`.
+ # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
+ # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
+ # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
+ # down.
+ #
+ # If the workers are not torn down by the service, they will
+ # continue to run and use Google Compute Engine VM resources in the
+ # user's project until they are explicitly terminated by the user.
+ # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
+ # policy except for small, manually supervised test jobs.
+ #
+ # If unknown or unspecified, the service will attempt to choose a reasonable
+ # default.
+ "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
+ # the service will use the network "default".
+ "ipConfiguration": "A String", # Configuration for VM IPs.
+ "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
+ "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
+ "algorithm": "A String", # The algorithm to use for autoscaling.
+ },
+ "poolArgs": { # Extra arguments for this worker pool.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
+ # the form "regions/REGION/subnetworks/SUBNETWORK".
+ "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
+ # execute the job. If zero or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
+ # service will choose a number of threads (according to the number of cores
+ # on the selected machine type for batch, or 1 by convention for streaming).
+ "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
+ # harness, residing in Google Container Registry.
+ #
+ # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
+ "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
+ # using the standard Dataflow task runner. Users should ignore
+ # this field.
+ "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
+ "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
+ # access the Cloud Dataflow API.
+ "A String",
+ ],
+ "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workflowFileName": "A String", # The file to store the workflow in.
+ "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
+ # console.
+ "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
+ "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "root".
+ "vmId": "A String", # The ID string of the VM.
+ "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
+ "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
+ "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
+ # "shuffle/v1beta1".
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "reportingEnabled": True or False, # Whether to send work progress updates to the service.
+ "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
+ # "dataflow/v1b3/projects".
+ "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
+ #
+ # When workers access Google Cloud APIs, they logically do so via
+ # relative URLs. If this field is specified, it supplies the base
+ # URL to use for resolving these relative URLs. The normative
+ # algorithm used is defined by RFC 1808, "Relative Uniform Resource
+ # Locators".
+ #
+ # If not specified, the default value is "http://www.googleapis.com/"
+ "workerId": "A String", # The ID of the worker running this pipeline.
+ },
+ "harnessCommand": "A String", # The command to launch the worker harness.
+ "logDir": "A String", # The directory on the VM to store logs.
+ "streamingWorkerMainClass": "A String", # The streaming worker main class name.
+ "languageHint": "A String", # The suggested backend language.
+ "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
+ # taskrunner; e.g. "wheel".
+ "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
+ # will not be uploaded.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "commandlinesFileName": "A String", # The file to store preprocessing commands in.
+ "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
+ "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
+ # temporary storage.
+ #
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ },
+ "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
+ # attempt to choose a reasonable default.
+ "defaultPackageSet": "A String", # The default package set to install. This allows the service to
+ # select a default set of packages which are useful to worker
+ # harnesses written in a particular language.
+ "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
+ # service will attempt to choose a reasonable default.
+ },
+ ],
+ "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
+ # storage. The system will append the suffix "/temp-{JOBNAME} to
+ # this resource prefix, where {JOBNAME} is the value of the
+ # job_name field. The resulting bucket and object prefix is used
+ # as the prefix of the resources used to store temporary data
+ # needed during the job execution. NOTE: This will override the
+ # value in taskrunner_settings.
+ # The supported resource type is:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "internalExperiments": { # Experimental settings.
+ "a_key": "", # Properties of the object. Contains field @type with type URL.
+ },
+ "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
+ # options are passed through the service and are used to recreate the
+ # SDK pipeline options on the worker in a language agnostic and platform
+ # independent way.
+ "a_key": "", # Properties of the object.
+ },
+ "dataset": "A String", # The dataset for the current project where various workflow
+ # related tables are stored.
+ #
+ # The supported resource type is:
+ #
+ # Google BigQuery:
+ # bigquery.googleapis.com/{dataset}
+ "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
+ # unspecified, the service will attempt to choose a reasonable
+ # default. This should be in the form of the API service name,
+ # e.g. "compute.googleapis.com".
+ },
+ "stepsLocation": "A String", # The GCS location where the steps are stored.
+ "steps": [ # Exactly one of step or steps_location should be specified.
+ #
+ # The top-level steps that constitute the entire job.
+ { # Defines a particular step within a Cloud Dataflow job.
+ #
+ # A job consists of multiple steps, each of which performs some
+ # specific operation as part of the overall job. Data is typically
+ # passed from one step to another as part of the job.
+ #
+ # Here's an example of a sequence of steps which together implement a
+ # Map-Reduce job:
+ #
+ # * Read a collection of data from some source, parsing the
+ # collection's elements.
+ #
+ # * Validate the elements.
+ #
+ # * Apply a user-defined function to map each element to some value
+ # and extract an element-specific key value.
+ #
+ # * Group elements with the same key into a single element with
+ # that key, transforming a multiply-keyed collection into a
+ # uniquely-keyed collection.
+ #
+ # * Write the elements out to some data sink.
+ #
+ # Note that the Cloud Dataflow service may be used to run many different
+ # types of jobs, not just Map-Reduce.
+ "kind": "A String", # The kind of step in the Cloud Dataflow job.
+ "properties": { # Named properties associated with the step. Each kind of
+ # predefined step has its own required set of properties.
+ # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
+ "a_key": "", # Properties of the object.
+ },
+ "name": "A String", # The name that identifies the step. This must be unique for each
+ # step with respect to all other steps in the Cloud Dataflow job.
+ },
+ ],
+ "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ { # A message describing the state of a particular execution stage.
+ "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
+ "executionStageName": "A String", # The name of the execution stage.
+ "currentStateTime": "A String", # The time at which the stage transitioned to this state.
+ },
+ ],
+ "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
+ # `JOB_STATE_UPDATED`), this field contains the ID of that job.
+ "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
+ # by the metadata values provided here. Populated for ListJobs and all GetJob
+ # views SUMMARY and higher.
+ # ListJob response and Job SUMMARY view.
+ "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
+ "sdkSupportStatus": "A String", # The support status for this SDK version.
+ "versionDisplayName": "A String", # A readable string describing the version of the SDK.
+ "version": "A String", # The version of the SDK used to run the job.
+ },
+ "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
+ { # Metadata for a BigTable connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "tableId": "A String", # TableId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
+ { # Metadata for a PubSub connector used by the job.
+ "subscription": "A String", # Subscription used in the connection.
+ "topic": "A String", # Topic accessed in the connection.
+ },
+ ],
+ "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
+ { # Metadata for a BigQuery connector used by the job.
+ "dataset": "A String", # Dataset accessed in the connection.
+ "projectId": "A String", # Project accessed in the connection.
+ "query": "A String", # Query used to access data in the connection.
+ "table": "A String", # Table accessed in the connection.
+ },
+ ],
+ "fileDetails": [ # Identification of a File source used in the Dataflow job.
+ { # Metadata for a File connector used by the job.
+ "filePattern": "A String", # File Pattern used to access files by the connector.
+ },
+ ],
+ "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
+ { # Metadata for a Datastore connector used by the job.
+ "namespace": "A String", # Namespace used in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
+ { # Metadata for a Spanner connector used by the job.
+ "instanceId": "A String", # InstanceId accessed in the connection.
+ "databaseId": "A String", # DatabaseId accessed in the connection.
+ "projectId": "A String", # ProjectId accessed in the connection.
+ },
+ ],
+ },
+ "location": "A String", # The [regional endpoint]
+ # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
+ # contains this job.
+ "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
+ # corresponding name prefixes of the new job.
+ "a_key": "A String",
+ },
+ "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
+ # Flexible resource scheduling jobs are started with some delay after job
+ # creation, so start_time is unset before start and is updated when the
+ # job is started by the Cloud Dataflow service. For other jobs, start_time
+ # always equals to create_time and is immutable and set by the Cloud Dataflow
+ # service.
+ "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
+ # If this field is set, the service will ensure its uniqueness.
+ # The request to create a job will fail if the service has knowledge of a
+ # previously submitted job with the same client's ID and job name.
+ # The caller may use this field to ensure idempotence of job
+ # creation across retried attempts to create a job.
+ # By default, the field is empty and, in that case, the service ignores it.
+ "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
+ # isn't contained in the submitted job.
+ "stages": { # A mapping from each stage to the information about that stage.
+ "a_key": { # Contains information about how a particular
+ # google.dataflow.v1beta3.Step will be executed.
+ "stepName": [ # The steps associated with the execution stage.
+ # Note that stages may have several steps, and that a given step
+ # might be run by more than one stage.
+ "A String",
+ ],
+ },
+ },
+ },
+ "type": "A String", # The type of Cloud Dataflow job.
+ "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
+ # Cloud Dataflow service.
+ "tempFiles": [ # A set of files the system should be aware of that are used
+ # for temporary storage. These temporary files will be
+ # removed on job completion.
+ # No duplicates are allowed.
+ # No file patterns are supported.
+ #
+ # The supported files are:
+ #
+ # Google Cloud Storage:
+ #
+ # storage.googleapis.com/{bucket}/{object}
+ # bucket.storage.googleapis.com/{object}
+ "A String",
+ ],
+ "id": "A String", # The unique ID of this job.
+ #
+ # This field is set by the Cloud Dataflow service when the Job is
+ # created, and is immutable for the life of the job.
+ "requestedState": "A String", # The job's requested state.
+ #
+ # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
+ # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
+ # also be used to directly set a job's requested state to
+ # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
+ # job if it has not already reached a terminal state.
+ "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
+ # of the job it replaced.
+ #
+ # When sending a `CreateJobRequest`, you can update a job by specifying it
+ # here. The job named here is stopped, and its intermediate state is
+ # transferred to this job.
+ "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
+ # snapshot.
+ "currentState": "A String", # The current state of the job.
+ #
+ # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
+ # specified.
+ #
+ # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
+ # terminal state. After a job has reached a terminal state, no
+ # further state updates may be made.
+ #
+ # This field may be mutated by the Cloud Dataflow service;
+ # callers cannot mutate it.
+ "name": "A String", # The user-specified Cloud Dataflow job name.
+ #
+ # Only one Job with a given name may exist in a project at any
+ # given time. If a caller attempts to create a Job with the same
+ # name as an already-existing Job, the attempt returns the
+ # existing Job.
+ #
+ # The name must match the regular expression
+ # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
+ "currentStateTime": "A String", # The timestamp associated with the current state.
+ }</pre>
</div>
</body></html>
\ No newline at end of file