blob: 8d850276ee0740357b6862269aec66c8e054c76b [file] [log] [blame]
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.jobs.html">jobs</a></h1>
Nathaniel Manista4f877e52015-06-15 16:44:50 +000076<h2>Instance Methods</h2>
77<p class="toc_element">
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -070078 <code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>
79</p>
80<p class="firstline">Returns the debug Resource.</p>
81
82<p class="toc_element">
Nathaniel Manista4f877e52015-06-15 16:44:50 +000083 <code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>
84</p>
85<p class="firstline">Returns the messages Resource.</p>
86
87<p class="toc_element">
88 <code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>
89</p>
90<p class="firstline">Returns the workItems Resource.</p>
91
92<p class="toc_element">
Bu Sun Kim65020912020-05-20 12:08:20 -070093 <code><a href="#aggregated">aggregated(projectId, pageToken=None, pageSize=None, view=None, filter=None, location=None, x__xgafv=None)</a></code></p>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070094<p class="firstline">List the jobs of a project across all regions.</p>
95<p class="toc_element">
96 <code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>
97<p class="firstline">Retrieves the next page of results.</p>
98<p class="toc_element">
Bu Sun Kim65020912020-05-20 12:08:20 -070099 <code><a href="#create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400100<p class="firstline">Creates a Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000101<p class="toc_element">
Bu Sun Kim65020912020-05-20 12:08:20 -0700102 <code><a href="#get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400103<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000104<p class="toc_element">
Bu Sun Kim65020912020-05-20 12:08:20 -0700105 <code><a href="#getMetrics">getMetrics(projectId, jobId, location=None, startTime=None, x__xgafv=None)</a></code></p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000106<p class="firstline">Request the job status.</p>
107<p class="toc_element">
Bu Sun Kim65020912020-05-20 12:08:20 -0700108 <code><a href="#list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400109<p class="firstline">List the jobs of a project.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000110<p class="toc_element">
111 <code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
112<p class="firstline">Retrieves the next page of results.</p>
113<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700114 <code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700115<p class="firstline">Snapshot the state of a streaming job.</p>
116<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700117 <code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400118<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000119<h3>Method Details</h3>
120<div class="method">
Bu Sun Kim65020912020-05-20 12:08:20 -0700121 <code class="details" id="aggregated">aggregated(projectId, pageToken=None, pageSize=None, view=None, filter=None, location=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700122 <pre>List the jobs of a project across all regions.
123
124Args:
125 projectId: string, The project which owns the jobs. (required)
Bu Sun Kim65020912020-05-20 12:08:20 -0700126 pageToken: string, Set this to the &#x27;next_page_token&#x27; field of a previous response
127to request additional results in a long list.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700128 pageSize: integer, If there are many jobs, limit response to at most this many.
129The actual number of jobs returned will be the lesser of max_responses
130and an unspecified server-defined limit.
Bu Sun Kim65020912020-05-20 12:08:20 -0700131 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
132 filter: string, The kind of filter to use.
133 location: string, The [regional endpoint]
134(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
135contains this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700136 x__xgafv: string, V1 error format.
137 Allowed values
138 1 - v1 error format
139 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700140
141Returns:
142 An object of the form:
143
Dan O'Mearadd494642020-05-01 07:42:23 -0700144 { # Response to a request to list Cloud Dataflow jobs in a project. This might
145 # be a partial response, depending on the page size in the ListJobsRequest.
146 # However, if the project does not have any jobs, an instance of
Bu Sun Kim65020912020-05-20 12:08:20 -0700147 # ListJobsResponse is not returned and the requests&#x27;s response
Dan O'Mearadd494642020-05-01 07:42:23 -0700148 # body is empty {}.
Bu Sun Kim65020912020-05-20 12:08:20 -0700149 &quot;nextPageToken&quot;: &quot;A String&quot;, # Set if there may be more results than fit in this response.
150 &quot;failedLocation&quot;: [ # Zero or more messages describing the [regional endpoints]
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700151 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
152 # failed to respond.
153 { # Indicates which [regional endpoint]
154 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
155 # to respond to a request for data.
Bu Sun Kim65020912020-05-20 12:08:20 -0700156 &quot;name&quot;: &quot;A String&quot;, # The name of the [regional endpoint]
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700157 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
158 # failed to respond.
159 },
160 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700161 &quot;jobs&quot;: [ # A subset of the requested job information.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700162 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -0700163 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
164 # If this field is set, the service will ensure its uniqueness.
165 # The request to create a job will fail if the service has knowledge of a
166 # previously submitted job with the same client&#x27;s ID and job name.
167 # The caller may use this field to ensure idempotence of job
168 # creation across retried attempts to create a job.
169 # By default, the field is empty and, in that case, the service ignores it.
170 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700171 #
172 # This field is set by the Cloud Dataflow service when the Job is
173 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700174 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
175 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700176 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700177 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700178 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700179 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
180 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700181 # options are passed through the service and are used to recreate the
182 # SDK pipeline options on the worker in a language agnostic and platform
183 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -0700184 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700185 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700186 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
187 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700188 # specified in order for the job to have workers.
189 { # Describes one particular pool of Cloud Dataflow workers to be
190 # instantiated by the Cloud Dataflow service in order to perform the
191 # computations required by a job. Note that a workflow job may use
192 # multiple pools, in order to match the various computational
193 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700194 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
195 # select a default set of packages which are useful to worker
196 # harnesses written in a particular language.
197 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
198 # the service will use the network &quot;default&quot;.
199 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -0700200 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700201 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
202 # execute the job. If zero or unspecified, the service will
203 # attempt to choose a reasonable default.
204 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -0700205 # service will choose a number of threads (according to the number of cores
206 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -0700207 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
208 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700209 { # The packages that must be installed in order for a worker to run the
210 # steps of the Cloud Dataflow job that will be assigned to its worker
211 # pool.
212 #
213 # This is the mechanism by which the Cloud Dataflow SDK causes code to
214 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -0700215 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700216 # various dependencies (libraries, data files, etc.) required in order
217 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -0700218 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700219 #
220 # Google Cloud Storage:
221 #
222 # storage.googleapis.com/{bucket}
223 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -0700224 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700225 },
226 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700227 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700228 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
229 # `TEARDOWN_NEVER`.
230 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
231 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
232 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
233 # down.
234 #
235 # If the workers are not torn down by the service, they will
236 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -0700237 # user&#x27;s project until they are explicitly terminated by the user.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700238 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
239 # policy except for small, manually supervised test jobs.
240 #
241 # If unknown or unspecified, the service will attempt to choose a reasonable
242 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700243 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
244 # Compute Engine API.
245 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
246 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
247 },
248 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -0700249 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700250 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
251 # harness, residing in Google Container Registry.
252 #
253 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
254 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700255 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700256 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
257 # service will attempt to choose a reasonable default.
258 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
259 # are supported.
260 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700261 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700262 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700263 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700264 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700265 # must be a disk type appropriate to the project and zone in which
266 # the workers will run. If unknown or unspecified, the service
267 # will attempt to choose a reasonable default.
268 #
269 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -0700270 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
271 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700272 # actual valid values are defined the Google Compute Engine API,
273 # not by the Cloud Dataflow API; consult the Google Compute Engine
274 # documentation for more information about determining the set of
275 # available disk types for a particular project and zone.
276 #
277 # Google Compute Engine Disk types are local to a particular
278 # project in a particular zone, and so the resource name will
279 # typically look something like this:
280 #
281 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -0700282 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700283 },
284 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700285 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -0700286 # only be set in the Fn API path. For non-cross-language pipelines this
287 # should have only one entry. Cross-language pipelines will have two or more
288 # entries.
289 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -0700290 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
291 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -0700292 # container instance with this image. If false (or unset) recommends using
293 # more than one core per SDK container instance with this image for
294 # efficiency. Note that Dataflow service may choose to override this property
295 # if needed.
296 },
297 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700298 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
299 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
300 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
301 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
302 # using the standard Dataflow task runner. Users should ignore
303 # this field.
304 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
305 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
306 # taskrunner; e.g. &quot;wheel&quot;.
307 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
308 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
309 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
310 # access the Cloud Dataflow API.
311 &quot;A String&quot;,
312 ],
313 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
314 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
315 # will not be uploaded.
316 #
317 # The supported resource type is:
318 #
319 # Google Cloud Storage:
320 # storage.googleapis.com/{bucket}/{object}
321 # bucket.storage.googleapis.com/{object}
322 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
323 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
324 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
325 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
326 # temporary storage.
327 #
328 # The supported resource type is:
329 #
330 # Google Cloud Storage:
331 # storage.googleapis.com/{bucket}/{object}
332 # bucket.storage.googleapis.com/{object}
333 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
334 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
335 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
336 #
337 # When workers access Google Cloud APIs, they logically do so via
338 # relative URLs. If this field is specified, it supplies the base
339 # URL to use for resolving these relative URLs. The normative
340 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
341 # Locators&quot;.
342 #
343 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
344 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
345 # console.
346 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
347 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
348 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
349 #
350 # When workers access Google Cloud APIs, they logically do so via
351 # relative URLs. If this field is specified, it supplies the base
352 # URL to use for resolving these relative URLs. The normative
353 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
354 # Locators&quot;.
355 #
356 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
357 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
358 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
359 # &quot;dataflow/v1b3/projects&quot;.
360 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
361 # &quot;shuffle/v1beta1&quot;.
362 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
363 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
364 # storage.
365 #
366 # The supported resource type is:
367 #
368 # Google Cloud Storage:
369 #
370 # storage.googleapis.com/{bucket}/{object}
371 # bucket.storage.googleapis.com/{object}
372 },
373 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
374 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
375 # taskrunner; e.g. &quot;root&quot;.
376 },
377 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
378 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
379 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
380 },
381 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
382 &quot;a_key&quot;: &quot;A String&quot;,
383 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700384 },
385 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700386 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
387 # related tables are stored.
388 #
389 # The supported resource type is:
390 #
391 # Google BigQuery:
392 # bigquery.googleapis.com/{dataset}
393 &quot;internalExperiments&quot;: { # Experimental settings.
394 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
395 },
396 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
397 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
398 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
399 # with worker_zone. If neither worker_region nor worker_zone is specified,
400 # default to the control plane&#x27;s region.
401 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
402 # at rest, AKA a Customer Managed Encryption Key (CMEK).
403 #
404 # Format:
405 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
406 &quot;userAgent&quot;: { # A description of the process that generated the request.
407 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
408 },
409 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
410 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
411 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
412 # with worker_region. If neither worker_region nor worker_zone is specified,
413 # a zone in the control plane&#x27;s region is chosen based on available capacity.
414 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -0700415 # unspecified, the service will attempt to choose a reasonable
416 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -0700417 # e.g. &quot;compute.googleapis.com&quot;.
418 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
419 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700420 # this resource prefix, where {JOBNAME} is the value of the
421 # job_name field. The resulting bucket and object prefix is used
422 # as the prefix of the resources used to store temporary data
423 # needed during the job execution. NOTE: This will override the
424 # value in taskrunner_settings.
425 # The supported resource type is:
426 #
427 # Google Cloud Storage:
428 #
429 # storage.googleapis.com/{bucket}/{object}
430 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -0700431 &quot;experiments&quot;: [ # The list of experiments to enable.
432 &quot;A String&quot;,
433 ],
434 &quot;version&quot;: { # A structure describing which components and their versions of the service
435 # are required in order to run the job.
436 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
437 },
438 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700439 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700440 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
441 # callers cannot mutate it.
442 { # A message describing the state of a particular execution stage.
443 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
444 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
445 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
446 },
447 ],
448 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
449 # by the metadata values provided here. Populated for ListJobs and all GetJob
450 # views SUMMARY and higher.
451 # ListJob response and Job SUMMARY view.
452 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
453 { # Metadata for a BigTable connector used by the job.
454 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
455 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
456 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
457 },
458 ],
459 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
460 { # Metadata for a Spanner connector used by the job.
461 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
462 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
463 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
464 },
465 ],
466 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
467 { # Metadata for a Datastore connector used by the job.
468 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
469 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
470 },
471 ],
472 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
473 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
474 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
475 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
476 },
477 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
478 { # Metadata for a BigQuery connector used by the job.
479 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
480 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
481 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
482 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
483 },
484 ],
485 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
486 { # Metadata for a File connector used by the job.
487 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
488 },
489 ],
490 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
491 { # Metadata for a PubSub connector used by the job.
492 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
493 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
494 },
495 ],
496 },
497 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
498 # snapshot.
499 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
500 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
501 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
502 # A description of the user pipeline and stages through which it is executed.
503 # Created by Cloud Dataflow service. Only retrieved with
504 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
505 # form. This data is provided by the Dataflow service for ease of visualizing
506 # the pipeline and interpreting Dataflow provided metrics.
507 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
508 { # Description of the composing transforms, names/ids, and input/outputs of a
509 # stage of execution. Some composing transforms and sources may have been
510 # generated by the Dataflow service during execution planning.
511 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
512 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
513 { # Description of a transform executed as part of an execution stage.
514 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
515 # most closely associated.
516 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
517 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
518 },
519 ],
520 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
521 { # Description of an interstitial value between transforms in an execution
522 # stage.
523 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
524 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
525 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
526 # source is most closely associated.
527 },
528 ],
529 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
530 &quot;outputSource&quot;: [ # Output sources for this stage.
531 { # Description of an input or output of an execution stage.
532 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
533 # source is most closely associated.
534 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
535 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
536 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
537 },
538 ],
539 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
540 &quot;inputSource&quot;: [ # Input sources for this stage.
541 { # Description of an input or output of an execution stage.
542 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
543 # source is most closely associated.
544 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
545 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
546 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
547 },
548 ],
549 },
550 ],
551 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
552 { # Description of the type, names/ids, and input/outputs for a transform.
553 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
554 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
555 &quot;A String&quot;,
556 ],
557 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
558 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
559 &quot;displayData&quot;: [ # Transform-specific display data.
560 { # Data provided with a pipeline or transform to provide descriptive info.
561 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
562 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
563 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
564 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
565 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
566 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
567 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
568 # language namespace (i.e. python module) which defines the display data.
569 # This allows a dax monitoring system to specially handle the data
570 # and perform custom rendering.
571 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
572 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
573 # This is intended to be used as a label for the display data
574 # when viewed in a dax monitoring system.
575 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
576 # For example a java_class_name_value of com.mypackage.MyDoFn
577 # will be stored with MyDoFn as the short_str_value and
578 # com.mypackage.MyDoFn as the java_class_name value.
579 # short_str_value can be displayed and java_class_name_value
580 # will be displayed as a tooltip.
581 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
582 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
583 },
584 ],
585 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
586 &quot;A String&quot;,
587 ],
588 },
589 ],
590 &quot;displayData&quot;: [ # Pipeline level display data.
591 { # Data provided with a pipeline or transform to provide descriptive info.
592 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
593 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
594 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
595 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
596 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
597 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
598 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
599 # language namespace (i.e. python module) which defines the display data.
600 # This allows a dax monitoring system to specially handle the data
601 # and perform custom rendering.
602 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
603 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
604 # This is intended to be used as a label for the display data
605 # when viewed in a dax monitoring system.
606 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
607 # For example a java_class_name_value of com.mypackage.MyDoFn
608 # will be stored with MyDoFn as the short_str_value and
609 # com.mypackage.MyDoFn as the java_class_name value.
610 # short_str_value can be displayed and java_class_name_value
611 # will be displayed as a tooltip.
612 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
613 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
614 },
615 ],
616 },
617 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
618 # of the job it replaced.
619 #
620 # When sending a `CreateJobRequest`, you can update a job by specifying it
621 # here. The job named here is stopped, and its intermediate state is
622 # transferred to this job.
623 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700624 # for temporary storage. These temporary files will be
625 # removed on job completion.
626 # No duplicates are allowed.
627 # No file patterns are supported.
628 #
629 # The supported files are:
630 #
631 # Google Cloud Storage:
632 #
633 # storage.googleapis.com/{bucket}/{object}
634 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -0700635 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700636 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700637 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700638 #
639 # Only one Job with a given name may exist in a project at any
640 # given time. If a caller attempts to create a Job with the same
641 # name as an already-existing Job, the attempt returns the
642 # existing Job.
643 #
644 # The name must match the regular expression
645 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -0700646 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700647 #
648 # The top-level steps that constitute the entire job.
649 { # Defines a particular step within a Cloud Dataflow job.
650 #
651 # A job consists of multiple steps, each of which performs some
652 # specific operation as part of the overall job. Data is typically
653 # passed from one step to another as part of the job.
654 #
Bu Sun Kim65020912020-05-20 12:08:20 -0700655 # Here&#x27;s an example of a sequence of steps which together implement a
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700656 # Map-Reduce job:
657 #
658 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -0700659 # collection&#x27;s elements.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700660 #
661 # * Validate the elements.
662 #
663 # * Apply a user-defined function to map each element to some value
664 # and extract an element-specific key value.
665 #
666 # * Group elements with the same key into a single element with
667 # that key, transforming a multiply-keyed collection into a
668 # uniquely-keyed collection.
669 #
670 # * Write the elements out to some data sink.
671 #
672 # Note that the Cloud Dataflow service may be used to run many different
673 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -0700674 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -0700675 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700676 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
677 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700678 # predefined step has its own required set of properties.
679 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -0700680 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700681 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700682 },
683 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700684 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
685 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
686 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
687 # isn&#x27;t contained in the submitted job.
688 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
689 &quot;a_key&quot;: { # Contains information about how a particular
690 # google.dataflow.v1beta3.Step will be executed.
691 &quot;stepName&quot;: [ # The steps associated with the execution stage.
692 # Note that stages may have several steps, and that a given step
693 # might be run by more than one stage.
694 &quot;A String&quot;,
695 ],
696 },
697 },
698 },
699 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700700 #
701 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
702 # specified.
703 #
704 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
705 # terminal state. After a job has reached a terminal state, no
706 # further state updates may be made.
707 #
708 # This field may be mutated by the Cloud Dataflow service;
709 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -0700710 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
711 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
712 # contains this job.
713 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
714 # Flexible resource scheduling jobs are started with some delay after job
715 # creation, so start_time is unset before start and is updated when the
716 # job is started by the Cloud Dataflow service. For other jobs, start_time
717 # always equals to create_time and is immutable and set by the Cloud Dataflow
718 # service.
719 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
720 &quot;labels&quot;: { # User-defined labels for this job.
721 #
722 # The labels map can contain no more than 64 entries. Entries of the labels
723 # map are UTF8 strings that comply with the following restrictions:
724 #
725 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
726 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
727 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
728 # size.
729 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700730 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700731 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
732 # Cloud Dataflow service.
733 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
734 #
735 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
736 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
737 # also be used to directly set a job&#x27;s requested state to
738 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
739 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700740 },
741 ],
742 }</pre>
743</div>
744
745<div class="method">
746 <code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>
747 <pre>Retrieves the next page of results.
748
749Args:
750 previous_request: The request for the previous page. (required)
751 previous_response: The response from the request for the previous page. (required)
752
753Returns:
Bu Sun Kim65020912020-05-20 12:08:20 -0700754 A request object that you can call &#x27;execute()&#x27; on to request the next
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700755 page. Returns None if there are no more items in the collection.
756 </pre>
757</div>
758
759<div class="method">
Bu Sun Kim65020912020-05-20 12:08:20 -0700760 <code class="details" id="create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400761 <pre>Creates a Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000762
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700763To create a job, we recommend using `projects.locations.jobs.create` with a
764[regional endpoint]
765(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
766`projects.jobs.create` is not recommended, as your job will always start
767in `us-central1`.
768
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000769Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400770 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -0700771 body: object, The request body.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000772 The object takes the form of:
773
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400774{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -0700775 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
776 # If this field is set, the service will ensure its uniqueness.
777 # The request to create a job will fail if the service has knowledge of a
778 # previously submitted job with the same client&#x27;s ID and job name.
779 # The caller may use this field to ensure idempotence of job
780 # creation across retried attempts to create a job.
781 # By default, the field is empty and, in that case, the service ignores it.
782 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700783 #
784 # This field is set by the Cloud Dataflow service when the Job is
785 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700786 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
787 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700788 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700789 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700790 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700791 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
792 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700793 # options are passed through the service and are used to recreate the
794 # SDK pipeline options on the worker in a language agnostic and platform
795 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -0700796 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700797 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700798 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
799 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700800 # specified in order for the job to have workers.
801 { # Describes one particular pool of Cloud Dataflow workers to be
802 # instantiated by the Cloud Dataflow service in order to perform the
803 # computations required by a job. Note that a workflow job may use
804 # multiple pools, in order to match the various computational
805 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700806 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
807 # select a default set of packages which are useful to worker
808 # harnesses written in a particular language.
809 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
810 # the service will use the network &quot;default&quot;.
811 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -0700812 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700813 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
814 # execute the job. If zero or unspecified, the service will
815 # attempt to choose a reasonable default.
816 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -0700817 # service will choose a number of threads (according to the number of cores
818 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -0700819 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
820 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700821 { # The packages that must be installed in order for a worker to run the
822 # steps of the Cloud Dataflow job that will be assigned to its worker
823 # pool.
824 #
825 # This is the mechanism by which the Cloud Dataflow SDK causes code to
826 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -0700827 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700828 # various dependencies (libraries, data files, etc.) required in order
829 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -0700830 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700831 #
832 # Google Cloud Storage:
833 #
834 # storage.googleapis.com/{bucket}
835 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -0700836 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700837 },
838 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700839 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700840 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
841 # `TEARDOWN_NEVER`.
842 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
843 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
844 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
845 # down.
846 #
847 # If the workers are not torn down by the service, they will
848 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -0700849 # user&#x27;s project until they are explicitly terminated by the user.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700850 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
851 # policy except for small, manually supervised test jobs.
852 #
853 # If unknown or unspecified, the service will attempt to choose a reasonable
854 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700855 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
856 # Compute Engine API.
857 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
858 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
859 },
860 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -0700861 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700862 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
863 # harness, residing in Google Container Registry.
864 #
865 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
866 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700867 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700868 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
869 # service will attempt to choose a reasonable default.
870 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
871 # are supported.
872 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700873 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700874 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700875 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -0700876 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700877 # must be a disk type appropriate to the project and zone in which
878 # the workers will run. If unknown or unspecified, the service
879 # will attempt to choose a reasonable default.
880 #
881 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -0700882 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
883 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700884 # actual valid values are defined the Google Compute Engine API,
885 # not by the Cloud Dataflow API; consult the Google Compute Engine
886 # documentation for more information about determining the set of
887 # available disk types for a particular project and zone.
888 #
889 # Google Compute Engine Disk types are local to a particular
890 # project in a particular zone, and so the resource name will
891 # typically look something like this:
892 #
893 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -0700894 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700895 },
896 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700897 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -0700898 # only be set in the Fn API path. For non-cross-language pipelines this
899 # should have only one entry. Cross-language pipelines will have two or more
900 # entries.
901 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -0700902 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
903 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -0700904 # container instance with this image. If false (or unset) recommends using
905 # more than one core per SDK container instance with this image for
906 # efficiency. Note that Dataflow service may choose to override this property
907 # if needed.
908 },
909 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700910 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
911 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
912 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
913 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
914 # using the standard Dataflow task runner. Users should ignore
915 # this field.
916 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
917 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
918 # taskrunner; e.g. &quot;wheel&quot;.
919 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
920 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
921 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
922 # access the Cloud Dataflow API.
923 &quot;A String&quot;,
924 ],
925 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
926 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
927 # will not be uploaded.
928 #
929 # The supported resource type is:
930 #
931 # Google Cloud Storage:
932 # storage.googleapis.com/{bucket}/{object}
933 # bucket.storage.googleapis.com/{object}
934 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
935 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
936 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
937 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
938 # temporary storage.
939 #
940 # The supported resource type is:
941 #
942 # Google Cloud Storage:
943 # storage.googleapis.com/{bucket}/{object}
944 # bucket.storage.googleapis.com/{object}
945 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
946 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
947 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
948 #
949 # When workers access Google Cloud APIs, they logically do so via
950 # relative URLs. If this field is specified, it supplies the base
951 # URL to use for resolving these relative URLs. The normative
952 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
953 # Locators&quot;.
954 #
955 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
956 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
957 # console.
958 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
959 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
960 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
961 #
962 # When workers access Google Cloud APIs, they logically do so via
963 # relative URLs. If this field is specified, it supplies the base
964 # URL to use for resolving these relative URLs. The normative
965 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
966 # Locators&quot;.
967 #
968 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
969 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
970 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
971 # &quot;dataflow/v1b3/projects&quot;.
972 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
973 # &quot;shuffle/v1beta1&quot;.
974 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
975 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
976 # storage.
977 #
978 # The supported resource type is:
979 #
980 # Google Cloud Storage:
981 #
982 # storage.googleapis.com/{bucket}/{object}
983 # bucket.storage.googleapis.com/{object}
984 },
985 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
986 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
987 # taskrunner; e.g. &quot;root&quot;.
988 },
989 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
990 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
991 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
992 },
993 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
994 &quot;a_key&quot;: &quot;A String&quot;,
995 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700996 },
997 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700998 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
999 # related tables are stored.
1000 #
1001 # The supported resource type is:
1002 #
1003 # Google BigQuery:
1004 # bigquery.googleapis.com/{dataset}
1005 &quot;internalExperiments&quot;: { # Experimental settings.
1006 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1007 },
1008 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
1009 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1010 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
1011 # with worker_zone. If neither worker_region nor worker_zone is specified,
1012 # default to the control plane&#x27;s region.
1013 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
1014 # at rest, AKA a Customer Managed Encryption Key (CMEK).
1015 #
1016 # Format:
1017 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
1018 &quot;userAgent&quot;: { # A description of the process that generated the request.
1019 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1020 },
1021 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
1022 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1023 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
1024 # with worker_region. If neither worker_region nor worker_zone is specified,
1025 # a zone in the control plane&#x27;s region is chosen based on available capacity.
1026 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -07001027 # unspecified, the service will attempt to choose a reasonable
1028 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -07001029 # e.g. &quot;compute.googleapis.com&quot;.
1030 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1031 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001032 # this resource prefix, where {JOBNAME} is the value of the
1033 # job_name field. The resulting bucket and object prefix is used
1034 # as the prefix of the resources used to store temporary data
1035 # needed during the job execution. NOTE: This will override the
1036 # value in taskrunner_settings.
1037 # The supported resource type is:
1038 #
1039 # Google Cloud Storage:
1040 #
1041 # storage.googleapis.com/{bucket}/{object}
1042 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07001043 &quot;experiments&quot;: [ # The list of experiments to enable.
1044 &quot;A String&quot;,
1045 ],
1046 &quot;version&quot;: { # A structure describing which components and their versions of the service
1047 # are required in order to run the job.
1048 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1049 },
1050 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001051 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001052 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
1053 # callers cannot mutate it.
1054 { # A message describing the state of a particular execution stage.
1055 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
1056 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
1057 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
1058 },
1059 ],
1060 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1061 # by the metadata values provided here. Populated for ListJobs and all GetJob
1062 # views SUMMARY and higher.
1063 # ListJob response and Job SUMMARY view.
1064 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
1065 { # Metadata for a BigTable connector used by the job.
1066 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
1067 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1068 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1069 },
1070 ],
1071 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
1072 { # Metadata for a Spanner connector used by the job.
1073 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
1074 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1075 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1076 },
1077 ],
1078 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
1079 { # Metadata for a Datastore connector used by the job.
1080 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1081 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
1082 },
1083 ],
1084 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
1085 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
1086 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
1087 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
1088 },
1089 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
1090 { # Metadata for a BigQuery connector used by the job.
1091 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
1092 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
1093 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
1094 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
1095 },
1096 ],
1097 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
1098 { # Metadata for a File connector used by the job.
1099 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
1100 },
1101 ],
1102 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
1103 { # Metadata for a PubSub connector used by the job.
1104 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
1105 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
1106 },
1107 ],
1108 },
1109 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
1110 # snapshot.
1111 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
1112 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
1113 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1114 # A description of the user pipeline and stages through which it is executed.
1115 # Created by Cloud Dataflow service. Only retrieved with
1116 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1117 # form. This data is provided by the Dataflow service for ease of visualizing
1118 # the pipeline and interpreting Dataflow provided metrics.
1119 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
1120 { # Description of the composing transforms, names/ids, and input/outputs of a
1121 # stage of execution. Some composing transforms and sources may have been
1122 # generated by the Dataflow service during execution planning.
1123 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
1124 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
1125 { # Description of a transform executed as part of an execution stage.
1126 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
1127 # most closely associated.
1128 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1129 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1130 },
1131 ],
1132 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
1133 { # Description of an interstitial value between transforms in an execution
1134 # stage.
1135 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1136 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1137 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1138 # source is most closely associated.
1139 },
1140 ],
1141 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
1142 &quot;outputSource&quot;: [ # Output sources for this stage.
1143 { # Description of an input or output of an execution stage.
1144 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1145 # source is most closely associated.
1146 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1147 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1148 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1149 },
1150 ],
1151 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
1152 &quot;inputSource&quot;: [ # Input sources for this stage.
1153 { # Description of an input or output of an execution stage.
1154 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1155 # source is most closely associated.
1156 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1157 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1158 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1159 },
1160 ],
1161 },
1162 ],
1163 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
1164 { # Description of the type, names/ids, and input/outputs for a transform.
1165 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
1166 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
1167 &quot;A String&quot;,
1168 ],
1169 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
1170 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
1171 &quot;displayData&quot;: [ # Transform-specific display data.
1172 { # Data provided with a pipeline or transform to provide descriptive info.
1173 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1174 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1175 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1176 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1177 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1178 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1179 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1180 # language namespace (i.e. python module) which defines the display data.
1181 # This allows a dax monitoring system to specially handle the data
1182 # and perform custom rendering.
1183 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1184 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1185 # This is intended to be used as a label for the display data
1186 # when viewed in a dax monitoring system.
1187 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1188 # For example a java_class_name_value of com.mypackage.MyDoFn
1189 # will be stored with MyDoFn as the short_str_value and
1190 # com.mypackage.MyDoFn as the java_class_name value.
1191 # short_str_value can be displayed and java_class_name_value
1192 # will be displayed as a tooltip.
1193 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1194 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1195 },
1196 ],
1197 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
1198 &quot;A String&quot;,
1199 ],
1200 },
1201 ],
1202 &quot;displayData&quot;: [ # Pipeline level display data.
1203 { # Data provided with a pipeline or transform to provide descriptive info.
1204 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1205 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1206 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1207 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1208 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1209 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1210 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1211 # language namespace (i.e. python module) which defines the display data.
1212 # This allows a dax monitoring system to specially handle the data
1213 # and perform custom rendering.
1214 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1215 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1216 # This is intended to be used as a label for the display data
1217 # when viewed in a dax monitoring system.
1218 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1219 # For example a java_class_name_value of com.mypackage.MyDoFn
1220 # will be stored with MyDoFn as the short_str_value and
1221 # com.mypackage.MyDoFn as the java_class_name value.
1222 # short_str_value can be displayed and java_class_name_value
1223 # will be displayed as a tooltip.
1224 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1225 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1226 },
1227 ],
1228 },
1229 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
1230 # of the job it replaced.
1231 #
1232 # When sending a `CreateJobRequest`, you can update a job by specifying it
1233 # here. The job named here is stopped, and its intermediate state is
1234 # transferred to this job.
1235 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001236 # for temporary storage. These temporary files will be
1237 # removed on job completion.
1238 # No duplicates are allowed.
1239 # No file patterns are supported.
1240 #
1241 # The supported files are:
1242 #
1243 # Google Cloud Storage:
1244 #
1245 # storage.googleapis.com/{bucket}/{object}
1246 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07001247 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001248 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001249 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001250 #
1251 # Only one Job with a given name may exist in a project at any
1252 # given time. If a caller attempts to create a Job with the same
1253 # name as an already-existing Job, the attempt returns the
1254 # existing Job.
1255 #
1256 # The name must match the regular expression
1257 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -07001258 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001259 #
1260 # The top-level steps that constitute the entire job.
1261 { # Defines a particular step within a Cloud Dataflow job.
1262 #
1263 # A job consists of multiple steps, each of which performs some
1264 # specific operation as part of the overall job. Data is typically
1265 # passed from one step to another as part of the job.
1266 #
Bu Sun Kim65020912020-05-20 12:08:20 -07001267 # Here&#x27;s an example of a sequence of steps which together implement a
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001268 # Map-Reduce job:
1269 #
1270 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07001271 # collection&#x27;s elements.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001272 #
1273 # * Validate the elements.
1274 #
1275 # * Apply a user-defined function to map each element to some value
1276 # and extract an element-specific key value.
1277 #
1278 # * Group elements with the same key into a single element with
1279 # that key, transforming a multiply-keyed collection into a
1280 # uniquely-keyed collection.
1281 #
1282 # * Write the elements out to some data sink.
1283 #
1284 # Note that the Cloud Dataflow service may be used to run many different
1285 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07001286 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -07001287 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001288 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
1289 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001290 # predefined step has its own required set of properties.
1291 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07001292 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001293 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001294 },
1295 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001296 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
1297 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1298 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1299 # isn&#x27;t contained in the submitted job.
1300 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
1301 &quot;a_key&quot;: { # Contains information about how a particular
1302 # google.dataflow.v1beta3.Step will be executed.
1303 &quot;stepName&quot;: [ # The steps associated with the execution stage.
1304 # Note that stages may have several steps, and that a given step
1305 # might be run by more than one stage.
1306 &quot;A String&quot;,
1307 ],
1308 },
1309 },
1310 },
1311 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001312 #
1313 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1314 # specified.
1315 #
1316 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1317 # terminal state. After a job has reached a terminal state, no
1318 # further state updates may be made.
1319 #
1320 # This field may be mutated by the Cloud Dataflow service;
1321 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -07001322 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
1323 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1324 # contains this job.
1325 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1326 # Flexible resource scheduling jobs are started with some delay after job
1327 # creation, so start_time is unset before start and is updated when the
1328 # job is started by the Cloud Dataflow service. For other jobs, start_time
1329 # always equals to create_time and is immutable and set by the Cloud Dataflow
1330 # service.
1331 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
1332 &quot;labels&quot;: { # User-defined labels for this job.
1333 #
1334 # The labels map can contain no more than 64 entries. Entries of the labels
1335 # map are UTF8 strings that comply with the following restrictions:
1336 #
1337 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1338 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1339 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
1340 # size.
1341 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001342 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001343 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
1344 # Cloud Dataflow service.
1345 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
1346 #
1347 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1348 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1349 # also be used to directly set a job&#x27;s requested state to
1350 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1351 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001352}
1353
1354 location: string, The [regional endpoint]
1355(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1356contains this job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001357 replaceJobId: string, Deprecated. This field is now in the Job message.
1358 view: string, The level of information requested in response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001359 x__xgafv: string, V1 error format.
1360 Allowed values
1361 1 - v1 error format
1362 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001363
1364Returns:
1365 An object of the form:
1366
1367 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -07001368 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
1369 # If this field is set, the service will ensure its uniqueness.
1370 # The request to create a job will fail if the service has knowledge of a
1371 # previously submitted job with the same client&#x27;s ID and job name.
1372 # The caller may use this field to ensure idempotence of job
1373 # creation across retried attempts to create a job.
1374 # By default, the field is empty and, in that case, the service ignores it.
1375 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001376 #
1377 # This field is set by the Cloud Dataflow service when the Job is
1378 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001379 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
1380 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001381 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001382 &quot;a_key&quot;: &quot;A String&quot;,
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001383 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001384 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
1385 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001386 # options are passed through the service and are used to recreate the
1387 # SDK pipeline options on the worker in a language agnostic and platform
1388 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07001389 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Takashi Matsuo06694102015-09-11 13:55:40 -07001390 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001391 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
1392 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001393 # specified in order for the job to have workers.
1394 { # Describes one particular pool of Cloud Dataflow workers to be
1395 # instantiated by the Cloud Dataflow service in order to perform the
1396 # computations required by a job. Note that a workflow job may use
1397 # multiple pools, in order to match the various computational
1398 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001399 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
1400 # select a default set of packages which are useful to worker
1401 # harnesses written in a particular language.
1402 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
1403 # the service will use the network &quot;default&quot;.
1404 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -07001405 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07001406 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
1407 # execute the job. If zero or unspecified, the service will
1408 # attempt to choose a reasonable default.
1409 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -07001410 # service will choose a number of threads (according to the number of cores
1411 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -07001412 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
1413 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001414 { # The packages that must be installed in order for a worker to run the
1415 # steps of the Cloud Dataflow job that will be assigned to its worker
1416 # pool.
1417 #
1418 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1419 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -07001420 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001421 # various dependencies (libraries, data files, etc.) required in order
1422 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -07001423 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001424 #
1425 # Google Cloud Storage:
1426 #
1427 # storage.googleapis.com/{bucket}
1428 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -07001429 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001430 },
1431 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001432 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001433 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1434 # `TEARDOWN_NEVER`.
1435 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1436 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1437 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1438 # down.
1439 #
1440 # If the workers are not torn down by the service, they will
1441 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -07001442 # user&#x27;s project until they are explicitly terminated by the user.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001443 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1444 # policy except for small, manually supervised test jobs.
1445 #
1446 # If unknown or unspecified, the service will attempt to choose a reasonable
1447 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -07001448 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
1449 # Compute Engine API.
1450 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
1451 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1452 },
1453 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -07001454 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07001455 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
1456 # harness, residing in Google Container Registry.
1457 #
1458 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1459 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001460 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07001461 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
1462 # service will attempt to choose a reasonable default.
1463 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
1464 # are supported.
1465 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001466 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001467 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001468 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07001469 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001470 # must be a disk type appropriate to the project and zone in which
1471 # the workers will run. If unknown or unspecified, the service
1472 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001473 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001474 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -07001475 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
1476 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001477 # actual valid values are defined the Google Compute Engine API,
1478 # not by the Cloud Dataflow API; consult the Google Compute Engine
1479 # documentation for more information about determining the set of
1480 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001481 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001482 # Google Compute Engine Disk types are local to a particular
1483 # project in a particular zone, and so the resource name will
1484 # typically look something like this:
1485 #
1486 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -07001487 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001488 },
1489 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001490 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -07001491 # only be set in the Fn API path. For non-cross-language pipelines this
1492 # should have only one entry. Cross-language pipelines will have two or more
1493 # entries.
1494 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -07001495 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
1496 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -07001497 # container instance with this image. If false (or unset) recommends using
1498 # more than one core per SDK container instance with this image for
1499 # efficiency. Note that Dataflow service may choose to override this property
1500 # if needed.
1501 },
1502 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001503 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1504 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
1505 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
1506 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1507 # using the standard Dataflow task runner. Users should ignore
1508 # this field.
1509 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
1510 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
1511 # taskrunner; e.g. &quot;wheel&quot;.
1512 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
1513 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
1514 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
1515 # access the Cloud Dataflow API.
1516 &quot;A String&quot;,
1517 ],
1518 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
1519 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
1520 # will not be uploaded.
1521 #
1522 # The supported resource type is:
1523 #
1524 # Google Cloud Storage:
1525 # storage.googleapis.com/{bucket}/{object}
1526 # bucket.storage.googleapis.com/{object}
1527 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
1528 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
1529 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
1530 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
1531 # temporary storage.
1532 #
1533 # The supported resource type is:
1534 #
1535 # Google Cloud Storage:
1536 # storage.googleapis.com/{bucket}/{object}
1537 # bucket.storage.googleapis.com/{object}
1538 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
1539 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
1540 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1541 #
1542 # When workers access Google Cloud APIs, they logically do so via
1543 # relative URLs. If this field is specified, it supplies the base
1544 # URL to use for resolving these relative URLs. The normative
1545 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1546 # Locators&quot;.
1547 #
1548 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
1549 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1550 # console.
1551 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
1552 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1553 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
1554 #
1555 # When workers access Google Cloud APIs, they logically do so via
1556 # relative URLs. If this field is specified, it supplies the base
1557 # URL to use for resolving these relative URLs. The normative
1558 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1559 # Locators&quot;.
1560 #
1561 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
1562 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
1563 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
1564 # &quot;dataflow/v1b3/projects&quot;.
1565 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
1566 # &quot;shuffle/v1beta1&quot;.
1567 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
1568 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1569 # storage.
1570 #
1571 # The supported resource type is:
1572 #
1573 # Google Cloud Storage:
1574 #
1575 # storage.googleapis.com/{bucket}/{object}
1576 # bucket.storage.googleapis.com/{object}
1577 },
1578 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
1579 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
1580 # taskrunner; e.g. &quot;root&quot;.
1581 },
1582 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1583 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
1584 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
1585 },
1586 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
1587 &quot;a_key&quot;: &quot;A String&quot;,
1588 },
Takashi Matsuo06694102015-09-11 13:55:40 -07001589 },
1590 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001591 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
1592 # related tables are stored.
1593 #
1594 # The supported resource type is:
1595 #
1596 # Google BigQuery:
1597 # bigquery.googleapis.com/{dataset}
1598 &quot;internalExperiments&quot;: { # Experimental settings.
1599 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1600 },
1601 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
1602 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1603 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
1604 # with worker_zone. If neither worker_region nor worker_zone is specified,
1605 # default to the control plane&#x27;s region.
1606 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
1607 # at rest, AKA a Customer Managed Encryption Key (CMEK).
1608 #
1609 # Format:
1610 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
1611 &quot;userAgent&quot;: { # A description of the process that generated the request.
1612 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1613 },
1614 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
1615 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1616 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
1617 # with worker_region. If neither worker_region nor worker_zone is specified,
1618 # a zone in the control plane&#x27;s region is chosen based on available capacity.
1619 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -07001620 # unspecified, the service will attempt to choose a reasonable
1621 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -07001622 # e.g. &quot;compute.googleapis.com&quot;.
1623 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1624 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001625 # this resource prefix, where {JOBNAME} is the value of the
1626 # job_name field. The resulting bucket and object prefix is used
1627 # as the prefix of the resources used to store temporary data
1628 # needed during the job execution. NOTE: This will override the
1629 # value in taskrunner_settings.
1630 # The supported resource type is:
1631 #
1632 # Google Cloud Storage:
1633 #
1634 # storage.googleapis.com/{bucket}/{object}
1635 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07001636 &quot;experiments&quot;: [ # The list of experiments to enable.
1637 &quot;A String&quot;,
1638 ],
1639 &quot;version&quot;: { # A structure describing which components and their versions of the service
1640 # are required in order to run the job.
1641 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1642 },
1643 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001644 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001645 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
1646 # callers cannot mutate it.
1647 { # A message describing the state of a particular execution stage.
1648 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
1649 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
1650 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
1651 },
1652 ],
1653 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1654 # by the metadata values provided here. Populated for ListJobs and all GetJob
1655 # views SUMMARY and higher.
1656 # ListJob response and Job SUMMARY view.
1657 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
1658 { # Metadata for a BigTable connector used by the job.
1659 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
1660 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1661 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1662 },
1663 ],
1664 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
1665 { # Metadata for a Spanner connector used by the job.
1666 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
1667 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1668 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1669 },
1670 ],
1671 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
1672 { # Metadata for a Datastore connector used by the job.
1673 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1674 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
1675 },
1676 ],
1677 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
1678 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
1679 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
1680 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
1681 },
1682 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
1683 { # Metadata for a BigQuery connector used by the job.
1684 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
1685 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
1686 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
1687 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
1688 },
1689 ],
1690 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
1691 { # Metadata for a File connector used by the job.
1692 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
1693 },
1694 ],
1695 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
1696 { # Metadata for a PubSub connector used by the job.
1697 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
1698 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
1699 },
1700 ],
1701 },
1702 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
1703 # snapshot.
1704 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
1705 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
1706 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1707 # A description of the user pipeline and stages through which it is executed.
1708 # Created by Cloud Dataflow service. Only retrieved with
1709 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1710 # form. This data is provided by the Dataflow service for ease of visualizing
1711 # the pipeline and interpreting Dataflow provided metrics.
1712 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
1713 { # Description of the composing transforms, names/ids, and input/outputs of a
1714 # stage of execution. Some composing transforms and sources may have been
1715 # generated by the Dataflow service during execution planning.
1716 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
1717 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
1718 { # Description of a transform executed as part of an execution stage.
1719 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
1720 # most closely associated.
1721 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1722 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1723 },
1724 ],
1725 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
1726 { # Description of an interstitial value between transforms in an execution
1727 # stage.
1728 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1729 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1730 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1731 # source is most closely associated.
1732 },
1733 ],
1734 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
1735 &quot;outputSource&quot;: [ # Output sources for this stage.
1736 { # Description of an input or output of an execution stage.
1737 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1738 # source is most closely associated.
1739 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1740 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1741 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1742 },
1743 ],
1744 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
1745 &quot;inputSource&quot;: [ # Input sources for this stage.
1746 { # Description of an input or output of an execution stage.
1747 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1748 # source is most closely associated.
1749 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1750 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1751 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1752 },
1753 ],
1754 },
1755 ],
1756 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
1757 { # Description of the type, names/ids, and input/outputs for a transform.
1758 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
1759 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
1760 &quot;A String&quot;,
1761 ],
1762 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
1763 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
1764 &quot;displayData&quot;: [ # Transform-specific display data.
1765 { # Data provided with a pipeline or transform to provide descriptive info.
1766 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1767 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1768 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1769 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1770 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1771 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1772 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1773 # language namespace (i.e. python module) which defines the display data.
1774 # This allows a dax monitoring system to specially handle the data
1775 # and perform custom rendering.
1776 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1777 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1778 # This is intended to be used as a label for the display data
1779 # when viewed in a dax monitoring system.
1780 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1781 # For example a java_class_name_value of com.mypackage.MyDoFn
1782 # will be stored with MyDoFn as the short_str_value and
1783 # com.mypackage.MyDoFn as the java_class_name value.
1784 # short_str_value can be displayed and java_class_name_value
1785 # will be displayed as a tooltip.
1786 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1787 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1788 },
1789 ],
1790 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
1791 &quot;A String&quot;,
1792 ],
1793 },
1794 ],
1795 &quot;displayData&quot;: [ # Pipeline level display data.
1796 { # Data provided with a pipeline or transform to provide descriptive info.
1797 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1798 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1799 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1800 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1801 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1802 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1803 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1804 # language namespace (i.e. python module) which defines the display data.
1805 # This allows a dax monitoring system to specially handle the data
1806 # and perform custom rendering.
1807 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1808 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1809 # This is intended to be used as a label for the display data
1810 # when viewed in a dax monitoring system.
1811 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1812 # For example a java_class_name_value of com.mypackage.MyDoFn
1813 # will be stored with MyDoFn as the short_str_value and
1814 # com.mypackage.MyDoFn as the java_class_name value.
1815 # short_str_value can be displayed and java_class_name_value
1816 # will be displayed as a tooltip.
1817 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1818 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1819 },
1820 ],
1821 },
1822 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
1823 # of the job it replaced.
1824 #
1825 # When sending a `CreateJobRequest`, you can update a job by specifying it
1826 # here. The job named here is stopped, and its intermediate state is
1827 # transferred to this job.
1828 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001829 # for temporary storage. These temporary files will be
1830 # removed on job completion.
1831 # No duplicates are allowed.
1832 # No file patterns are supported.
1833 #
1834 # The supported files are:
1835 #
1836 # Google Cloud Storage:
1837 #
1838 # storage.googleapis.com/{bucket}/{object}
1839 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07001840 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001841 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001842 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001843 #
1844 # Only one Job with a given name may exist in a project at any
1845 # given time. If a caller attempts to create a Job with the same
1846 # name as an already-existing Job, the attempt returns the
1847 # existing Job.
1848 #
1849 # The name must match the regular expression
1850 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -07001851 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001852 #
1853 # The top-level steps that constitute the entire job.
1854 { # Defines a particular step within a Cloud Dataflow job.
1855 #
1856 # A job consists of multiple steps, each of which performs some
1857 # specific operation as part of the overall job. Data is typically
1858 # passed from one step to another as part of the job.
1859 #
Bu Sun Kim65020912020-05-20 12:08:20 -07001860 # Here&#x27;s an example of a sequence of steps which together implement a
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001861 # Map-Reduce job:
1862 #
1863 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07001864 # collection&#x27;s elements.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001865 #
1866 # * Validate the elements.
1867 #
1868 # * Apply a user-defined function to map each element to some value
1869 # and extract an element-specific key value.
1870 #
1871 # * Group elements with the same key into a single element with
1872 # that key, transforming a multiply-keyed collection into a
1873 # uniquely-keyed collection.
1874 #
1875 # * Write the elements out to some data sink.
1876 #
1877 # Note that the Cloud Dataflow service may be used to run many different
1878 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07001879 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -07001880 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001881 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
1882 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001883 # predefined step has its own required set of properties.
1884 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07001885 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001886 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001887 },
1888 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07001889 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
1890 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1891 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1892 # isn&#x27;t contained in the submitted job.
1893 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
1894 &quot;a_key&quot;: { # Contains information about how a particular
1895 # google.dataflow.v1beta3.Step will be executed.
1896 &quot;stepName&quot;: [ # The steps associated with the execution stage.
1897 # Note that stages may have several steps, and that a given step
1898 # might be run by more than one stage.
1899 &quot;A String&quot;,
1900 ],
1901 },
1902 },
1903 },
1904 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001905 #
1906 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1907 # specified.
1908 #
1909 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1910 # terminal state. After a job has reached a terminal state, no
1911 # further state updates may be made.
1912 #
1913 # This field may be mutated by the Cloud Dataflow service;
1914 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -07001915 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
1916 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1917 # contains this job.
1918 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1919 # Flexible resource scheduling jobs are started with some delay after job
1920 # creation, so start_time is unset before start and is updated when the
1921 # job is started by the Cloud Dataflow service. For other jobs, start_time
1922 # always equals to create_time and is immutable and set by the Cloud Dataflow
1923 # service.
1924 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
1925 &quot;labels&quot;: { # User-defined labels for this job.
1926 #
1927 # The labels map can contain no more than 64 entries. Entries of the labels
1928 # map are UTF8 strings that comply with the following restrictions:
1929 #
1930 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1931 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1932 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
1933 # size.
1934 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001935 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001936 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
1937 # Cloud Dataflow service.
1938 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
1939 #
1940 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1941 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1942 # also be used to directly set a job&#x27;s requested state to
1943 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1944 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001945 }</pre>
1946</div>
1947
1948<div class="method">
Bu Sun Kim65020912020-05-20 12:08:20 -07001949 <code class="details" id="get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001950 <pre>Gets the state of the specified Cloud Dataflow job.
1951
1952To get the state of a job, we recommend using `projects.locations.jobs.get`
1953with a [regional endpoint]
1954(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1955`projects.jobs.get` is not recommended, as you can only get the state of
1956jobs that are running in `us-central1`.
1957
1958Args:
1959 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
1960 jobId: string, The job ID. (required)
Bu Sun Kim65020912020-05-20 12:08:20 -07001961 view: string, The level of information requested in response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001962 location: string, The [regional endpoint]
1963(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1964contains this job.
1965 x__xgafv: string, V1 error format.
1966 Allowed values
1967 1 - v1 error format
1968 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001969
1970Returns:
1971 An object of the form:
1972
1973 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -07001974 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
1975 # If this field is set, the service will ensure its uniqueness.
1976 # The request to create a job will fail if the service has knowledge of a
1977 # previously submitted job with the same client&#x27;s ID and job name.
1978 # The caller may use this field to ensure idempotence of job
1979 # creation across retried attempts to create a job.
1980 # By default, the field is empty and, in that case, the service ignores it.
1981 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001982 #
1983 # This field is set by the Cloud Dataflow service when the Job is
1984 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001985 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
1986 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001987 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001988 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001989 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001990 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
1991 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001992 # options are passed through the service and are used to recreate the
1993 # SDK pipeline options on the worker in a language agnostic and platform
1994 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07001995 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001996 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001997 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
1998 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001999 # specified in order for the job to have workers.
2000 { # Describes one particular pool of Cloud Dataflow workers to be
2001 # instantiated by the Cloud Dataflow service in order to perform the
2002 # computations required by a job. Note that a workflow job may use
2003 # multiple pools, in order to match the various computational
2004 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002005 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
2006 # select a default set of packages which are useful to worker
2007 # harnesses written in a particular language.
2008 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
2009 # the service will use the network &quot;default&quot;.
2010 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -07002011 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002012 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
2013 # execute the job. If zero or unspecified, the service will
2014 # attempt to choose a reasonable default.
2015 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -07002016 # service will choose a number of threads (according to the number of cores
2017 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -07002018 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
2019 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002020 { # The packages that must be installed in order for a worker to run the
2021 # steps of the Cloud Dataflow job that will be assigned to its worker
2022 # pool.
2023 #
2024 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2025 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -07002026 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002027 # various dependencies (libraries, data files, etc.) required in order
2028 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -07002029 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002030 #
2031 # Google Cloud Storage:
2032 #
2033 # storage.googleapis.com/{bucket}
2034 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -07002035 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002036 },
2037 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002038 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002039 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2040 # `TEARDOWN_NEVER`.
2041 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2042 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2043 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2044 # down.
2045 #
2046 # If the workers are not torn down by the service, they will
2047 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -07002048 # user&#x27;s project until they are explicitly terminated by the user.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002049 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2050 # policy except for small, manually supervised test jobs.
2051 #
2052 # If unknown or unspecified, the service will attempt to choose a reasonable
2053 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002054 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
2055 # Compute Engine API.
2056 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
2057 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
2058 },
2059 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -07002060 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002061 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
2062 # harness, residing in Google Container Registry.
2063 #
2064 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2065 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002066 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002067 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
2068 # service will attempt to choose a reasonable default.
2069 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
2070 # are supported.
2071 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002072 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002073 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002074 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002075 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002076 # must be a disk type appropriate to the project and zone in which
2077 # the workers will run. If unknown or unspecified, the service
2078 # will attempt to choose a reasonable default.
2079 #
2080 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -07002081 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
2082 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002083 # actual valid values are defined the Google Compute Engine API,
2084 # not by the Cloud Dataflow API; consult the Google Compute Engine
2085 # documentation for more information about determining the set of
2086 # available disk types for a particular project and zone.
2087 #
2088 # Google Compute Engine Disk types are local to a particular
2089 # project in a particular zone, and so the resource name will
2090 # typically look something like this:
2091 #
2092 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -07002093 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002094 },
2095 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002096 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -07002097 # only be set in the Fn API path. For non-cross-language pipelines this
2098 # should have only one entry. Cross-language pipelines will have two or more
2099 # entries.
2100 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -07002101 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
2102 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -07002103 # container instance with this image. If false (or unset) recommends using
2104 # more than one core per SDK container instance with this image for
2105 # efficiency. Note that Dataflow service may choose to override this property
2106 # if needed.
2107 },
2108 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002109 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2110 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
2111 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
2112 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2113 # using the standard Dataflow task runner. Users should ignore
2114 # this field.
2115 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
2116 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
2117 # taskrunner; e.g. &quot;wheel&quot;.
2118 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
2119 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
2120 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
2121 # access the Cloud Dataflow API.
2122 &quot;A String&quot;,
2123 ],
2124 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
2125 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
2126 # will not be uploaded.
2127 #
2128 # The supported resource type is:
2129 #
2130 # Google Cloud Storage:
2131 # storage.googleapis.com/{bucket}/{object}
2132 # bucket.storage.googleapis.com/{object}
2133 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
2134 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
2135 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
2136 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
2137 # temporary storage.
2138 #
2139 # The supported resource type is:
2140 #
2141 # Google Cloud Storage:
2142 # storage.googleapis.com/{bucket}/{object}
2143 # bucket.storage.googleapis.com/{object}
2144 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
2145 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
2146 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
2147 #
2148 # When workers access Google Cloud APIs, they logically do so via
2149 # relative URLs. If this field is specified, it supplies the base
2150 # URL to use for resolving these relative URLs. The normative
2151 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2152 # Locators&quot;.
2153 #
2154 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
2155 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2156 # console.
2157 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
2158 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2159 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
2160 #
2161 # When workers access Google Cloud APIs, they logically do so via
2162 # relative URLs. If this field is specified, it supplies the base
2163 # URL to use for resolving these relative URLs. The normative
2164 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2165 # Locators&quot;.
2166 #
2167 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
2168 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
2169 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
2170 # &quot;dataflow/v1b3/projects&quot;.
2171 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
2172 # &quot;shuffle/v1beta1&quot;.
2173 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
2174 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2175 # storage.
2176 #
2177 # The supported resource type is:
2178 #
2179 # Google Cloud Storage:
2180 #
2181 # storage.googleapis.com/{bucket}/{object}
2182 # bucket.storage.googleapis.com/{object}
2183 },
2184 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
2185 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
2186 # taskrunner; e.g. &quot;root&quot;.
2187 },
2188 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2189 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
2190 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
2191 },
2192 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
2193 &quot;a_key&quot;: &quot;A String&quot;,
2194 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002195 },
2196 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002197 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
2198 # related tables are stored.
2199 #
2200 # The supported resource type is:
2201 #
2202 # Google BigQuery:
2203 # bigquery.googleapis.com/{dataset}
2204 &quot;internalExperiments&quot;: { # Experimental settings.
2205 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
2206 },
2207 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
2208 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2209 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
2210 # with worker_zone. If neither worker_region nor worker_zone is specified,
2211 # default to the control plane&#x27;s region.
2212 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
2213 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2214 #
2215 # Format:
2216 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2217 &quot;userAgent&quot;: { # A description of the process that generated the request.
2218 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2219 },
2220 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
2221 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2222 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
2223 # with worker_region. If neither worker_region nor worker_zone is specified,
2224 # a zone in the control plane&#x27;s region is chosen based on available capacity.
2225 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -07002226 # unspecified, the service will attempt to choose a reasonable
2227 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -07002228 # e.g. &quot;compute.googleapis.com&quot;.
2229 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2230 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002231 # this resource prefix, where {JOBNAME} is the value of the
2232 # job_name field. The resulting bucket and object prefix is used
2233 # as the prefix of the resources used to store temporary data
2234 # needed during the job execution. NOTE: This will override the
2235 # value in taskrunner_settings.
2236 # The supported resource type is:
2237 #
2238 # Google Cloud Storage:
2239 #
2240 # storage.googleapis.com/{bucket}/{object}
2241 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07002242 &quot;experiments&quot;: [ # The list of experiments to enable.
2243 &quot;A String&quot;,
2244 ],
2245 &quot;version&quot;: { # A structure describing which components and their versions of the service
2246 # are required in order to run the job.
2247 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2248 },
2249 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002250 },
Bu Sun Kim65020912020-05-20 12:08:20 -07002251 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
2252 # callers cannot mutate it.
2253 { # A message describing the state of a particular execution stage.
2254 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
2255 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
2256 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
2257 },
2258 ],
2259 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2260 # by the metadata values provided here. Populated for ListJobs and all GetJob
2261 # views SUMMARY and higher.
2262 # ListJob response and Job SUMMARY view.
2263 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
2264 { # Metadata for a BigTable connector used by the job.
2265 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
2266 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2267 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
2268 },
2269 ],
2270 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
2271 { # Metadata for a Spanner connector used by the job.
2272 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
2273 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
2274 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2275 },
2276 ],
2277 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
2278 { # Metadata for a Datastore connector used by the job.
2279 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2280 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
2281 },
2282 ],
2283 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
2284 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
2285 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
2286 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
2287 },
2288 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
2289 { # Metadata for a BigQuery connector used by the job.
2290 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
2291 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
2292 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
2293 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
2294 },
2295 ],
2296 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
2297 { # Metadata for a File connector used by the job.
2298 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
2299 },
2300 ],
2301 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
2302 { # Metadata for a PubSub connector used by the job.
2303 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
2304 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
2305 },
2306 ],
2307 },
2308 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
2309 # snapshot.
2310 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
2311 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
2312 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2313 # A description of the user pipeline and stages through which it is executed.
2314 # Created by Cloud Dataflow service. Only retrieved with
2315 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2316 # form. This data is provided by the Dataflow service for ease of visualizing
2317 # the pipeline and interpreting Dataflow provided metrics.
2318 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
2319 { # Description of the composing transforms, names/ids, and input/outputs of a
2320 # stage of execution. Some composing transforms and sources may have been
2321 # generated by the Dataflow service during execution planning.
2322 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
2323 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
2324 { # Description of a transform executed as part of an execution stage.
2325 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
2326 # most closely associated.
2327 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2328 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
2329 },
2330 ],
2331 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
2332 { # Description of an interstitial value between transforms in an execution
2333 # stage.
2334 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2335 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
2336 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2337 # source is most closely associated.
2338 },
2339 ],
2340 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
2341 &quot;outputSource&quot;: [ # Output sources for this stage.
2342 { # Description of an input or output of an execution stage.
2343 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2344 # source is most closely associated.
2345 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2346 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
2347 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
2348 },
2349 ],
2350 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
2351 &quot;inputSource&quot;: [ # Input sources for this stage.
2352 { # Description of an input or output of an execution stage.
2353 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2354 # source is most closely associated.
2355 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2356 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
2357 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
2358 },
2359 ],
2360 },
2361 ],
2362 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
2363 { # Description of the type, names/ids, and input/outputs for a transform.
2364 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
2365 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
2366 &quot;A String&quot;,
2367 ],
2368 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
2369 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
2370 &quot;displayData&quot;: [ # Transform-specific display data.
2371 { # Data provided with a pipeline or transform to provide descriptive info.
2372 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
2373 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
2374 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
2375 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
2376 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
2377 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
2378 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
2379 # language namespace (i.e. python module) which defines the display data.
2380 # This allows a dax monitoring system to specially handle the data
2381 # and perform custom rendering.
2382 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
2383 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
2384 # This is intended to be used as a label for the display data
2385 # when viewed in a dax monitoring system.
2386 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
2387 # For example a java_class_name_value of com.mypackage.MyDoFn
2388 # will be stored with MyDoFn as the short_str_value and
2389 # com.mypackage.MyDoFn as the java_class_name value.
2390 # short_str_value can be displayed and java_class_name_value
2391 # will be displayed as a tooltip.
2392 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
2393 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
2394 },
2395 ],
2396 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
2397 &quot;A String&quot;,
2398 ],
2399 },
2400 ],
2401 &quot;displayData&quot;: [ # Pipeline level display data.
2402 { # Data provided with a pipeline or transform to provide descriptive info.
2403 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
2404 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
2405 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
2406 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
2407 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
2408 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
2409 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
2410 # language namespace (i.e. python module) which defines the display data.
2411 # This allows a dax monitoring system to specially handle the data
2412 # and perform custom rendering.
2413 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
2414 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
2415 # This is intended to be used as a label for the display data
2416 # when viewed in a dax monitoring system.
2417 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
2418 # For example a java_class_name_value of com.mypackage.MyDoFn
2419 # will be stored with MyDoFn as the short_str_value and
2420 # com.mypackage.MyDoFn as the java_class_name value.
2421 # short_str_value can be displayed and java_class_name_value
2422 # will be displayed as a tooltip.
2423 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
2424 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
2425 },
2426 ],
2427 },
2428 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
2429 # of the job it replaced.
2430 #
2431 # When sending a `CreateJobRequest`, you can update a job by specifying it
2432 # here. The job named here is stopped, and its intermediate state is
2433 # transferred to this job.
2434 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002435 # for temporary storage. These temporary files will be
2436 # removed on job completion.
2437 # No duplicates are allowed.
2438 # No file patterns are supported.
2439 #
2440 # The supported files are:
2441 #
2442 # Google Cloud Storage:
2443 #
2444 # storage.googleapis.com/{bucket}/{object}
2445 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07002446 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002447 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002448 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002449 #
2450 # Only one Job with a given name may exist in a project at any
2451 # given time. If a caller attempts to create a Job with the same
2452 # name as an already-existing Job, the attempt returns the
2453 # existing Job.
2454 #
2455 # The name must match the regular expression
2456 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -07002457 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002458 #
2459 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002460 { # Defines a particular step within a Cloud Dataflow job.
2461 #
2462 # A job consists of multiple steps, each of which performs some
2463 # specific operation as part of the overall job. Data is typically
2464 # passed from one step to another as part of the job.
2465 #
Bu Sun Kim65020912020-05-20 12:08:20 -07002466 # Here&#x27;s an example of a sequence of steps which together implement a
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002467 # Map-Reduce job:
2468 #
2469 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07002470 # collection&#x27;s elements.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002471 #
2472 # * Validate the elements.
2473 #
2474 # * Apply a user-defined function to map each element to some value
2475 # and extract an element-specific key value.
2476 #
2477 # * Group elements with the same key into a single element with
2478 # that key, transforming a multiply-keyed collection into a
2479 # uniquely-keyed collection.
2480 #
2481 # * Write the elements out to some data sink.
2482 #
2483 # Note that the Cloud Dataflow service may be used to run many different
2484 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07002485 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -07002486 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002487 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
2488 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002489 # predefined step has its own required set of properties.
2490 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07002491 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Takashi Matsuo06694102015-09-11 13:55:40 -07002492 },
2493 },
2494 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002495 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
2496 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2497 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
2498 # isn&#x27;t contained in the submitted job.
2499 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
2500 &quot;a_key&quot;: { # Contains information about how a particular
2501 # google.dataflow.v1beta3.Step will be executed.
2502 &quot;stepName&quot;: [ # The steps associated with the execution stage.
2503 # Note that stages may have several steps, and that a given step
2504 # might be run by more than one stage.
2505 &quot;A String&quot;,
2506 ],
2507 },
2508 },
2509 },
2510 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002511 #
2512 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
2513 # specified.
2514 #
2515 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
2516 # terminal state. After a job has reached a terminal state, no
2517 # further state updates may be made.
2518 #
2519 # This field may be mutated by the Cloud Dataflow service;
2520 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -07002521 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
2522 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2523 # contains this job.
2524 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
2525 # Flexible resource scheduling jobs are started with some delay after job
2526 # creation, so start_time is unset before start and is updated when the
2527 # job is started by the Cloud Dataflow service. For other jobs, start_time
2528 # always equals to create_time and is immutable and set by the Cloud Dataflow
2529 # service.
2530 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
2531 &quot;labels&quot;: { # User-defined labels for this job.
2532 #
2533 # The labels map can contain no more than 64 entries. Entries of the labels
2534 # map are UTF8 strings that comply with the following restrictions:
2535 #
2536 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2537 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
2538 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
2539 # size.
2540 &quot;a_key&quot;: &quot;A String&quot;,
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002541 },
Bu Sun Kim65020912020-05-20 12:08:20 -07002542 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
2543 # Cloud Dataflow service.
2544 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
2545 #
2546 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
2547 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
2548 # also be used to directly set a job&#x27;s requested state to
2549 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
2550 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002551 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002552</div>
2553
2554<div class="method">
Bu Sun Kim65020912020-05-20 12:08:20 -07002555 <code class="details" id="getMetrics">getMetrics(projectId, jobId, location=None, startTime=None, x__xgafv=None)</code>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002556 <pre>Request the job status.
2557
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002558To request the status of a job, we recommend using
2559`projects.locations.jobs.getMetrics` with a [regional endpoint]
2560(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
2561`projects.jobs.getMetrics` is not recommended, as you can only request the
2562status of jobs that are running in `us-central1`.
2563
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002564Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002565 projectId: string, A project id. (required)
2566 jobId: string, The job to get messages for. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002567 location: string, The [regional endpoint]
2568(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2569contains the job specified by job_id.
Bu Sun Kim65020912020-05-20 12:08:20 -07002570 startTime: string, Return only metric data that has changed since this time.
2571Default is to return all information about all metrics for the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07002572 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002573 Allowed values
2574 1 - v1 error format
2575 2 - v2 error format
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002576
2577Returns:
2578 An object of the form:
2579
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002580 { # JobMetrics contains a collection of metrics describing the detailed progress
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002581 # of a Dataflow job. Metrics correspond to user-defined and system-defined
2582 # metrics in the job.
2583 #
2584 # This resource captures only the most recent values of each metric;
2585 # time-series data can be queried for them (under the same metric names)
2586 # from Cloud Monitoring.
Bu Sun Kim65020912020-05-20 12:08:20 -07002587 &quot;metricTime&quot;: &quot;A String&quot;, # Timestamp as of which metric values are current.
2588 &quot;metrics&quot;: [ # All metrics for this job.
Takashi Matsuo06694102015-09-11 13:55:40 -07002589 { # Describes the state of a metric.
Bu Sun Kim65020912020-05-20 12:08:20 -07002590 &quot;set&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Set&quot; aggregation kind. The only
2591 # possible value type is a list of Values whose type can be Long, Double,
2592 # or String, according to the metric&#x27;s type. All Values in the list must
2593 # be of the same type.
2594 &quot;gauge&quot;: &quot;&quot;, # A struct value describing properties of a Gauge.
2595 # Metrics of gauge type show the value of a metric across time, and is
2596 # aggregated based on the newest value.
2597 &quot;cumulative&quot;: True or False, # True if this metric is reported as the total cumulative aggregate
2598 # value accumulated since the worker started working on this WorkItem.
2599 # By default this is false, indicating that this metric is reported
2600 # as a delta that is not associated with any WorkItem.
2601 &quot;internal&quot;: &quot;&quot;, # Worker-computed aggregate value for internal use by the Dataflow
2602 # service.
2603 &quot;kind&quot;: &quot;A String&quot;, # Metric aggregation kind. The possible metric aggregation kinds are
2604 # &quot;Sum&quot;, &quot;Max&quot;, &quot;Min&quot;, &quot;Mean&quot;, &quot;Set&quot;, &quot;And&quot;, &quot;Or&quot;, and &quot;Distribution&quot;.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002605 # The specified aggregation kind is case-insensitive.
2606 #
2607 # If omitted, this is not an aggregated value but instead
2608 # a single metric sample value.
Bu Sun Kim65020912020-05-20 12:08:20 -07002609 &quot;scalar&quot;: &quot;&quot;, # Worker-computed aggregate value for aggregation kinds &quot;Sum&quot;, &quot;Max&quot;, &quot;Min&quot;,
2610 # &quot;And&quot;, and &quot;Or&quot;. The possible value types are Long, Double, and Boolean.
2611 &quot;meanCount&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Mean&quot; aggregation kind.
2612 # This holds the count of the aggregated values and is used in combination
2613 # with mean_sum above to obtain the actual mean aggregate value.
2614 # The only possible value type is Long.
2615 &quot;meanSum&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Mean&quot; aggregation kind.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002616 # This holds the sum of the aggregated values and is used in combination
2617 # with mean_count below to obtain the actual mean aggregate value.
2618 # The only possible value types are Long and Double.
Bu Sun Kim65020912020-05-20 12:08:20 -07002619 &quot;updateTime&quot;: &quot;A String&quot;, # Timestamp associated with the metric value. Optional when workers are
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002620 # reporting work progress; it will be filled in responses from the
2621 # metrics API.
Bu Sun Kim65020912020-05-20 12:08:20 -07002622 &quot;name&quot;: { # Identifies a metric, by describing the source which generated the # Name of the metric.
2623 # metric.
2624 &quot;context&quot;: { # Zero or more labeled fields which identify the part of the job this
2625 # metric is associated with, such as the name of a step or collection.
2626 #
2627 # For example, built-in counters associated with steps will have
2628 # context[&#x27;step&#x27;] = &lt;step-name&gt;. Counters associated with PCollections
2629 # in the SDK will have context[&#x27;pcollection&#x27;] = &lt;pcollection-name&gt;.
2630 &quot;a_key&quot;: &quot;A String&quot;,
2631 },
2632 &quot;origin&quot;: &quot;A String&quot;, # Origin (namespace) of metric name. May be blank for user-define metrics;
2633 # will be &quot;dataflow&quot; for metrics defined by the Dataflow service or SDK.
2634 &quot;name&quot;: &quot;A String&quot;, # Worker-defined metric name.
2635 },
2636 &quot;distribution&quot;: &quot;&quot;, # A struct value describing properties of a distribution of numeric values.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002637 },
2638 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002639 }</pre>
2640</div>
2641
2642<div class="method">
Bu Sun Kim65020912020-05-20 12:08:20 -07002643 <code class="details" id="list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002644 <pre>List the jobs of a project.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002645
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002646To list the jobs of a project in a region, we recommend using
2647`projects.locations.jobs.get` with a [regional endpoint]
2648(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
2649list the all jobs across all regions, use `projects.jobs.aggregated`. Using
2650`projects.jobs.list` is not recommended, as you can only get the list of
2651jobs that are running in `us-central1`.
2652
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002653Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002654 projectId: string, The project which owns the jobs. (required)
Bu Sun Kim65020912020-05-20 12:08:20 -07002655 filter: string, The kind of filter to use.
2656 location: string, The [regional endpoint]
2657(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2658contains this job.
2659 pageToken: string, Set this to the &#x27;next_page_token&#x27; field of a previous response
2660to request additional results in a long list.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002661 pageSize: integer, If there are many jobs, limit response to at most this many.
2662The actual number of jobs returned will be the lesser of max_responses
2663and an unspecified server-defined limit.
Bu Sun Kim65020912020-05-20 12:08:20 -07002664 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Takashi Matsuo06694102015-09-11 13:55:40 -07002665 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002666 Allowed values
2667 1 - v1 error format
2668 2 - v2 error format
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002669
2670Returns:
2671 An object of the form:
2672
Dan O'Mearadd494642020-05-01 07:42:23 -07002673 { # Response to a request to list Cloud Dataflow jobs in a project. This might
2674 # be a partial response, depending on the page size in the ListJobsRequest.
2675 # However, if the project does not have any jobs, an instance of
Bu Sun Kim65020912020-05-20 12:08:20 -07002676 # ListJobsResponse is not returned and the requests&#x27;s response
Dan O'Mearadd494642020-05-01 07:42:23 -07002677 # body is empty {}.
Bu Sun Kim65020912020-05-20 12:08:20 -07002678 &quot;nextPageToken&quot;: &quot;A String&quot;, # Set if there may be more results than fit in this response.
2679 &quot;failedLocation&quot;: [ # Zero or more messages describing the [regional endpoints]
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002680 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2681 # failed to respond.
2682 { # Indicates which [regional endpoint]
2683 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
2684 # to respond to a request for data.
Bu Sun Kim65020912020-05-20 12:08:20 -07002685 &quot;name&quot;: &quot;A String&quot;, # The name of the [regional endpoint]
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002686 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2687 # failed to respond.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002688 },
2689 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002690 &quot;jobs&quot;: [ # A subset of the requested job information.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002691 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -07002692 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
2693 # If this field is set, the service will ensure its uniqueness.
2694 # The request to create a job will fail if the service has knowledge of a
2695 # previously submitted job with the same client&#x27;s ID and job name.
2696 # The caller may use this field to ensure idempotence of job
2697 # creation across retried attempts to create a job.
2698 # By default, the field is empty and, in that case, the service ignores it.
2699 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002700 #
2701 # This field is set by the Cloud Dataflow service when the Job is
2702 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002703 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
2704 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002705 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002706 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002707 },
Bu Sun Kim65020912020-05-20 12:08:20 -07002708 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2709 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002710 # options are passed through the service and are used to recreate the
2711 # SDK pipeline options on the worker in a language agnostic and platform
2712 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07002713 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002714 },
Bu Sun Kim65020912020-05-20 12:08:20 -07002715 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
2716 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002717 # specified in order for the job to have workers.
2718 { # Describes one particular pool of Cloud Dataflow workers to be
2719 # instantiated by the Cloud Dataflow service in order to perform the
2720 # computations required by a job. Note that a workflow job may use
2721 # multiple pools, in order to match the various computational
2722 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002723 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
2724 # select a default set of packages which are useful to worker
2725 # harnesses written in a particular language.
2726 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
2727 # the service will use the network &quot;default&quot;.
2728 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -07002729 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002730 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
2731 # execute the job. If zero or unspecified, the service will
2732 # attempt to choose a reasonable default.
2733 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -07002734 # service will choose a number of threads (according to the number of cores
2735 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -07002736 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
2737 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002738 { # The packages that must be installed in order for a worker to run the
2739 # steps of the Cloud Dataflow job that will be assigned to its worker
2740 # pool.
2741 #
2742 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2743 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -07002744 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002745 # various dependencies (libraries, data files, etc.) required in order
2746 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -07002747 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002748 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002749 # Google Cloud Storage:
2750 #
2751 # storage.googleapis.com/{bucket}
2752 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -07002753 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002754 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002755 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002756 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002757 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2758 # `TEARDOWN_NEVER`.
2759 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2760 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2761 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2762 # down.
2763 #
2764 # If the workers are not torn down by the service, they will
2765 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -07002766 # user&#x27;s project until they are explicitly terminated by the user.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002767 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2768 # policy except for small, manually supervised test jobs.
2769 #
2770 # If unknown or unspecified, the service will attempt to choose a reasonable
2771 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002772 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
2773 # Compute Engine API.
2774 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
2775 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
2776 },
2777 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -07002778 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002779 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
2780 # harness, residing in Google Container Registry.
2781 #
2782 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2783 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002784 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002785 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
2786 # service will attempt to choose a reasonable default.
2787 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
2788 # are supported.
2789 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002790 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002791 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002792 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07002793 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002794 # must be a disk type appropriate to the project and zone in which
2795 # the workers will run. If unknown or unspecified, the service
2796 # will attempt to choose a reasonable default.
2797 #
2798 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -07002799 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
2800 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002801 # actual valid values are defined the Google Compute Engine API,
2802 # not by the Cloud Dataflow API; consult the Google Compute Engine
2803 # documentation for more information about determining the set of
2804 # available disk types for a particular project and zone.
2805 #
2806 # Google Compute Engine Disk types are local to a particular
2807 # project in a particular zone, and so the resource name will
2808 # typically look something like this:
2809 #
2810 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -07002811 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002812 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002813 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002814 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -07002815 # only be set in the Fn API path. For non-cross-language pipelines this
2816 # should have only one entry. Cross-language pipelines will have two or more
2817 # entries.
2818 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -07002819 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
2820 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -07002821 # container instance with this image. If false (or unset) recommends using
2822 # more than one core per SDK container instance with this image for
2823 # efficiency. Note that Dataflow service may choose to override this property
2824 # if needed.
2825 },
2826 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002827 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2828 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
2829 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
2830 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2831 # using the standard Dataflow task runner. Users should ignore
2832 # this field.
2833 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
2834 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
2835 # taskrunner; e.g. &quot;wheel&quot;.
2836 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
2837 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
2838 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
2839 # access the Cloud Dataflow API.
2840 &quot;A String&quot;,
2841 ],
2842 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
2843 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
2844 # will not be uploaded.
2845 #
2846 # The supported resource type is:
2847 #
2848 # Google Cloud Storage:
2849 # storage.googleapis.com/{bucket}/{object}
2850 # bucket.storage.googleapis.com/{object}
2851 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
2852 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
2853 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
2854 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
2855 # temporary storage.
2856 #
2857 # The supported resource type is:
2858 #
2859 # Google Cloud Storage:
2860 # storage.googleapis.com/{bucket}/{object}
2861 # bucket.storage.googleapis.com/{object}
2862 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
2863 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
2864 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
2865 #
2866 # When workers access Google Cloud APIs, they logically do so via
2867 # relative URLs. If this field is specified, it supplies the base
2868 # URL to use for resolving these relative URLs. The normative
2869 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2870 # Locators&quot;.
2871 #
2872 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
2873 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2874 # console.
2875 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
2876 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2877 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
2878 #
2879 # When workers access Google Cloud APIs, they logically do so via
2880 # relative URLs. If this field is specified, it supplies the base
2881 # URL to use for resolving these relative URLs. The normative
2882 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2883 # Locators&quot;.
2884 #
2885 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
2886 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
2887 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
2888 # &quot;dataflow/v1b3/projects&quot;.
2889 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
2890 # &quot;shuffle/v1beta1&quot;.
2891 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
2892 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2893 # storage.
2894 #
2895 # The supported resource type is:
2896 #
2897 # Google Cloud Storage:
2898 #
2899 # storage.googleapis.com/{bucket}/{object}
2900 # bucket.storage.googleapis.com/{object}
2901 },
2902 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
2903 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
2904 # taskrunner; e.g. &quot;root&quot;.
2905 },
2906 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2907 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
2908 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
2909 },
2910 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
2911 &quot;a_key&quot;: &quot;A String&quot;,
2912 },
Takashi Matsuo06694102015-09-11 13:55:40 -07002913 },
2914 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07002915 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
2916 # related tables are stored.
2917 #
2918 # The supported resource type is:
2919 #
2920 # Google BigQuery:
2921 # bigquery.googleapis.com/{dataset}
2922 &quot;internalExperiments&quot;: { # Experimental settings.
2923 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
2924 },
2925 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
2926 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2927 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
2928 # with worker_zone. If neither worker_region nor worker_zone is specified,
2929 # default to the control plane&#x27;s region.
2930 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
2931 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2932 #
2933 # Format:
2934 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2935 &quot;userAgent&quot;: { # A description of the process that generated the request.
2936 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2937 },
2938 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
2939 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2940 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
2941 # with worker_region. If neither worker_region nor worker_zone is specified,
2942 # a zone in the control plane&#x27;s region is chosen based on available capacity.
2943 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -07002944 # unspecified, the service will attempt to choose a reasonable
2945 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -07002946 # e.g. &quot;compute.googleapis.com&quot;.
2947 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2948 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002949 # this resource prefix, where {JOBNAME} is the value of the
2950 # job_name field. The resulting bucket and object prefix is used
2951 # as the prefix of the resources used to store temporary data
2952 # needed during the job execution. NOTE: This will override the
2953 # value in taskrunner_settings.
2954 # The supported resource type is:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002955 #
2956 # Google Cloud Storage:
2957 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002958 # storage.googleapis.com/{bucket}/{object}
2959 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07002960 &quot;experiments&quot;: [ # The list of experiments to enable.
2961 &quot;A String&quot;,
2962 ],
2963 &quot;version&quot;: { # A structure describing which components and their versions of the service
2964 # are required in order to run the job.
2965 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2966 },
2967 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002968 },
Bu Sun Kim65020912020-05-20 12:08:20 -07002969 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
2970 # callers cannot mutate it.
2971 { # A message describing the state of a particular execution stage.
2972 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
2973 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
2974 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
2975 },
2976 ],
2977 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2978 # by the metadata values provided here. Populated for ListJobs and all GetJob
2979 # views SUMMARY and higher.
2980 # ListJob response and Job SUMMARY view.
2981 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
2982 { # Metadata for a BigTable connector used by the job.
2983 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
2984 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2985 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
2986 },
2987 ],
2988 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
2989 { # Metadata for a Spanner connector used by the job.
2990 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
2991 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
2992 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2993 },
2994 ],
2995 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
2996 { # Metadata for a Datastore connector used by the job.
2997 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2998 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
2999 },
3000 ],
3001 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
3002 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
3003 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
3004 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
3005 },
3006 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
3007 { # Metadata for a BigQuery connector used by the job.
3008 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
3009 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
3010 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
3011 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
3012 },
3013 ],
3014 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
3015 { # Metadata for a File connector used by the job.
3016 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
3017 },
3018 ],
3019 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
3020 { # Metadata for a PubSub connector used by the job.
3021 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
3022 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
3023 },
3024 ],
3025 },
3026 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
3027 # snapshot.
3028 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
3029 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
3030 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3031 # A description of the user pipeline and stages through which it is executed.
3032 # Created by Cloud Dataflow service. Only retrieved with
3033 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3034 # form. This data is provided by the Dataflow service for ease of visualizing
3035 # the pipeline and interpreting Dataflow provided metrics.
3036 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
3037 { # Description of the composing transforms, names/ids, and input/outputs of a
3038 # stage of execution. Some composing transforms and sources may have been
3039 # generated by the Dataflow service during execution planning.
3040 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
3041 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
3042 { # Description of a transform executed as part of an execution stage.
3043 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
3044 # most closely associated.
3045 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3046 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
3047 },
3048 ],
3049 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
3050 { # Description of an interstitial value between transforms in an execution
3051 # stage.
3052 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3053 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
3054 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3055 # source is most closely associated.
3056 },
3057 ],
3058 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
3059 &quot;outputSource&quot;: [ # Output sources for this stage.
3060 { # Description of an input or output of an execution stage.
3061 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3062 # source is most closely associated.
3063 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3064 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
3065 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
3066 },
3067 ],
3068 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
3069 &quot;inputSource&quot;: [ # Input sources for this stage.
3070 { # Description of an input or output of an execution stage.
3071 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3072 # source is most closely associated.
3073 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3074 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
3075 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
3076 },
3077 ],
3078 },
3079 ],
3080 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
3081 { # Description of the type, names/ids, and input/outputs for a transform.
3082 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
3083 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
3084 &quot;A String&quot;,
3085 ],
3086 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
3087 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
3088 &quot;displayData&quot;: [ # Transform-specific display data.
3089 { # Data provided with a pipeline or transform to provide descriptive info.
3090 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3091 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3092 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3093 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3094 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3095 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3096 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3097 # language namespace (i.e. python module) which defines the display data.
3098 # This allows a dax monitoring system to specially handle the data
3099 # and perform custom rendering.
3100 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3101 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3102 # This is intended to be used as a label for the display data
3103 # when viewed in a dax monitoring system.
3104 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3105 # For example a java_class_name_value of com.mypackage.MyDoFn
3106 # will be stored with MyDoFn as the short_str_value and
3107 # com.mypackage.MyDoFn as the java_class_name value.
3108 # short_str_value can be displayed and java_class_name_value
3109 # will be displayed as a tooltip.
3110 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3111 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3112 },
3113 ],
3114 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
3115 &quot;A String&quot;,
3116 ],
3117 },
3118 ],
3119 &quot;displayData&quot;: [ # Pipeline level display data.
3120 { # Data provided with a pipeline or transform to provide descriptive info.
3121 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3122 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3123 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3124 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3125 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3126 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3127 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3128 # language namespace (i.e. python module) which defines the display data.
3129 # This allows a dax monitoring system to specially handle the data
3130 # and perform custom rendering.
3131 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3132 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3133 # This is intended to be used as a label for the display data
3134 # when viewed in a dax monitoring system.
3135 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3136 # For example a java_class_name_value of com.mypackage.MyDoFn
3137 # will be stored with MyDoFn as the short_str_value and
3138 # com.mypackage.MyDoFn as the java_class_name value.
3139 # short_str_value can be displayed and java_class_name_value
3140 # will be displayed as a tooltip.
3141 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3142 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3143 },
3144 ],
3145 },
3146 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
3147 # of the job it replaced.
3148 #
3149 # When sending a `CreateJobRequest`, you can update a job by specifying it
3150 # here. The job named here is stopped, and its intermediate state is
3151 # transferred to this job.
3152 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003153 # for temporary storage. These temporary files will be
3154 # removed on job completion.
3155 # No duplicates are allowed.
3156 # No file patterns are supported.
3157 #
3158 # The supported files are:
3159 #
3160 # Google Cloud Storage:
3161 #
3162 # storage.googleapis.com/{bucket}/{object}
3163 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07003164 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003165 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003166 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003167 #
3168 # Only one Job with a given name may exist in a project at any
3169 # given time. If a caller attempts to create a Job with the same
3170 # name as an already-existing Job, the attempt returns the
3171 # existing Job.
3172 #
3173 # The name must match the regular expression
3174 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -07003175 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003176 #
3177 # The top-level steps that constitute the entire job.
3178 { # Defines a particular step within a Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003179 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003180 # A job consists of multiple steps, each of which performs some
3181 # specific operation as part of the overall job. Data is typically
3182 # passed from one step to another as part of the job.
3183 #
Bu Sun Kim65020912020-05-20 12:08:20 -07003184 # Here&#x27;s an example of a sequence of steps which together implement a
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003185 # Map-Reduce job:
3186 #
3187 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07003188 # collection&#x27;s elements.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003189 #
3190 # * Validate the elements.
3191 #
3192 # * Apply a user-defined function to map each element to some value
3193 # and extract an element-specific key value.
3194 #
3195 # * Group elements with the same key into a single element with
3196 # that key, transforming a multiply-keyed collection into a
3197 # uniquely-keyed collection.
3198 #
3199 # * Write the elements out to some data sink.
3200 #
3201 # Note that the Cloud Dataflow service may be used to run many different
3202 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07003203 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -07003204 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003205 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
3206 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003207 # predefined step has its own required set of properties.
3208 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07003209 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003210 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003211 },
3212 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003213 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
3214 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3215 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3216 # isn&#x27;t contained in the submitted job.
3217 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
3218 &quot;a_key&quot;: { # Contains information about how a particular
3219 # google.dataflow.v1beta3.Step will be executed.
3220 &quot;stepName&quot;: [ # The steps associated with the execution stage.
3221 # Note that stages may have several steps, and that a given step
3222 # might be run by more than one stage.
3223 &quot;A String&quot;,
3224 ],
3225 },
3226 },
3227 },
3228 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003229 #
3230 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3231 # specified.
3232 #
3233 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3234 # terminal state. After a job has reached a terminal state, no
3235 # further state updates may be made.
3236 #
3237 # This field may be mutated by the Cloud Dataflow service;
3238 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -07003239 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
3240 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3241 # contains this job.
3242 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3243 # Flexible resource scheduling jobs are started with some delay after job
3244 # creation, so start_time is unset before start and is updated when the
3245 # job is started by the Cloud Dataflow service. For other jobs, start_time
3246 # always equals to create_time and is immutable and set by the Cloud Dataflow
3247 # service.
3248 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
3249 &quot;labels&quot;: { # User-defined labels for this job.
3250 #
3251 # The labels map can contain no more than 64 entries. Entries of the labels
3252 # map are UTF8 strings that comply with the following restrictions:
3253 #
3254 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3255 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
3256 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
3257 # size.
3258 &quot;a_key&quot;: &quot;A String&quot;,
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003259 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003260 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
3261 # Cloud Dataflow service.
3262 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
3263 #
3264 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3265 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3266 # also be used to directly set a job&#x27;s requested state to
3267 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3268 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003269 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003270 ],
3271 }</pre>
3272</div>
3273
3274<div class="method">
3275 <code class="details" id="list_next">list_next(previous_request, previous_response)</code>
3276 <pre>Retrieves the next page of results.
3277
3278Args:
3279 previous_request: The request for the previous page. (required)
3280 previous_response: The response from the request for the previous page. (required)
3281
3282Returns:
Bu Sun Kim65020912020-05-20 12:08:20 -07003283 A request object that you can call &#x27;execute()&#x27; on to request the next
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003284 page. Returns None if there are no more items in the collection.
3285 </pre>
3286</div>
3287
3288<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07003289 <code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003290 <pre>Snapshot the state of a streaming job.
3291
3292Args:
3293 projectId: string, The project which owns the job to be snapshotted. (required)
3294 jobId: string, The job to be snapshotted. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07003295 body: object, The request body.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003296 The object takes the form of:
3297
3298{ # Request to create a snapshot of a job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003299 &quot;description&quot;: &quot;A String&quot;, # User specified description of the snapshot. Maybe empty.
3300 &quot;snapshotSources&quot;: True or False, # If true, perform snapshots for sources which support this.
3301 &quot;ttl&quot;: &quot;A String&quot;, # TTL for the snapshot.
3302 &quot;location&quot;: &quot;A String&quot;, # The location that contains this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003303 }
3304
3305 x__xgafv: string, V1 error format.
3306 Allowed values
3307 1 - v1 error format
3308 2 - v2 error format
3309
3310Returns:
3311 An object of the form:
3312
3313 { # Represents a snapshot of a job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003314 &quot;state&quot;: &quot;A String&quot;, # State of the snapshot.
3315 &quot;sourceJobId&quot;: &quot;A String&quot;, # The job this snapshot was created from.
3316 &quot;projectId&quot;: &quot;A String&quot;, # The project this snapshot belongs to.
3317 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this snapshot.
3318 &quot;ttl&quot;: &quot;A String&quot;, # The time after which this snapshot will be automatically deleted.
3319 &quot;description&quot;: &quot;A String&quot;, # User specified description of the snapshot. Maybe empty.
3320 &quot;diskSizeBytes&quot;: &quot;A String&quot;, # The disk byte size of the snapshot. Only available for snapshots in READY
Dan O'Mearadd494642020-05-01 07:42:23 -07003321 # state.
Bu Sun Kim65020912020-05-20 12:08:20 -07003322 &quot;pubsubMetadata&quot;: [ # PubSub snapshot metadata.
Dan O'Mearadd494642020-05-01 07:42:23 -07003323 { # Represents a Pubsub snapshot.
Bu Sun Kim65020912020-05-20 12:08:20 -07003324 &quot;expireTime&quot;: &quot;A String&quot;, # The expire time of the Pubsub snapshot.
3325 &quot;snapshotName&quot;: &quot;A String&quot;, # The name of the Pubsub snapshot.
3326 &quot;topicName&quot;: &quot;A String&quot;, # The name of the Pubsub topic.
Dan O'Mearadd494642020-05-01 07:42:23 -07003327 },
3328 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003329 &quot;creationTime&quot;: &quot;A String&quot;, # The time this snapshot was created.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003330 }</pre>
3331</div>
3332
3333<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07003334 <code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003335 <pre>Updates the state of an existing Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003336
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003337To update the state of an existing job, we recommend using
3338`projects.locations.jobs.update` with a [regional endpoint]
3339(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
3340`projects.jobs.update` is not recommended, as you can only update the state
3341of jobs that are running in `us-central1`.
3342
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003343Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003344 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
3345 jobId: string, The job ID. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07003346 body: object, The request body.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003347 The object takes the form of:
3348
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003349{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -07003350 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
3351 # If this field is set, the service will ensure its uniqueness.
3352 # The request to create a job will fail if the service has knowledge of a
3353 # previously submitted job with the same client&#x27;s ID and job name.
3354 # The caller may use this field to ensure idempotence of job
3355 # creation across retried attempts to create a job.
3356 # By default, the field is empty and, in that case, the service ignores it.
3357 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003358 #
3359 # This field is set by the Cloud Dataflow service when the Job is
3360 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003361 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
3362 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003363 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003364 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003365 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003366 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
3367 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003368 # options are passed through the service and are used to recreate the
3369 # SDK pipeline options on the worker in a language agnostic and platform
3370 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07003371 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003372 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003373 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
3374 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003375 # specified in order for the job to have workers.
3376 { # Describes one particular pool of Cloud Dataflow workers to be
3377 # instantiated by the Cloud Dataflow service in order to perform the
3378 # computations required by a job. Note that a workflow job may use
3379 # multiple pools, in order to match the various computational
3380 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003381 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
3382 # select a default set of packages which are useful to worker
3383 # harnesses written in a particular language.
3384 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
3385 # the service will use the network &quot;default&quot;.
3386 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -07003387 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07003388 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
3389 # execute the job. If zero or unspecified, the service will
3390 # attempt to choose a reasonable default.
3391 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -07003392 # service will choose a number of threads (according to the number of cores
3393 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -07003394 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
3395 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003396 { # The packages that must be installed in order for a worker to run the
3397 # steps of the Cloud Dataflow job that will be assigned to its worker
3398 # pool.
3399 #
3400 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3401 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -07003402 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003403 # various dependencies (libraries, data files, etc.) required in order
3404 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -07003405 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003406 #
3407 # Google Cloud Storage:
3408 #
3409 # storage.googleapis.com/{bucket}
3410 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -07003411 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003412 },
3413 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003414 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003415 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3416 # `TEARDOWN_NEVER`.
3417 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3418 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3419 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3420 # down.
3421 #
3422 # If the workers are not torn down by the service, they will
3423 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -07003424 # user&#x27;s project until they are explicitly terminated by the user.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003425 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3426 # policy except for small, manually supervised test jobs.
3427 #
3428 # If unknown or unspecified, the service will attempt to choose a reasonable
3429 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -07003430 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
3431 # Compute Engine API.
3432 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
3433 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
3434 },
3435 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -07003436 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07003437 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
3438 # harness, residing in Google Container Registry.
3439 #
3440 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
3441 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003442 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07003443 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
3444 # service will attempt to choose a reasonable default.
3445 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
3446 # are supported.
3447 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003448 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003449 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003450 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07003451 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003452 # must be a disk type appropriate to the project and zone in which
3453 # the workers will run. If unknown or unspecified, the service
3454 # will attempt to choose a reasonable default.
3455 #
3456 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -07003457 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
3458 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003459 # actual valid values are defined the Google Compute Engine API,
3460 # not by the Cloud Dataflow API; consult the Google Compute Engine
3461 # documentation for more information about determining the set of
3462 # available disk types for a particular project and zone.
3463 #
3464 # Google Compute Engine Disk types are local to a particular
3465 # project in a particular zone, and so the resource name will
3466 # typically look something like this:
3467 #
3468 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -07003469 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003470 },
3471 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003472 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -07003473 # only be set in the Fn API path. For non-cross-language pipelines this
3474 # should have only one entry. Cross-language pipelines will have two or more
3475 # entries.
3476 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -07003477 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
3478 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -07003479 # container instance with this image. If false (or unset) recommends using
3480 # more than one core per SDK container instance with this image for
3481 # efficiency. Note that Dataflow service may choose to override this property
3482 # if needed.
3483 },
3484 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003485 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3486 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
3487 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
3488 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
3489 # using the standard Dataflow task runner. Users should ignore
3490 # this field.
3491 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
3492 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
3493 # taskrunner; e.g. &quot;wheel&quot;.
3494 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
3495 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
3496 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
3497 # access the Cloud Dataflow API.
3498 &quot;A String&quot;,
3499 ],
3500 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
3501 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
3502 # will not be uploaded.
3503 #
3504 # The supported resource type is:
3505 #
3506 # Google Cloud Storage:
3507 # storage.googleapis.com/{bucket}/{object}
3508 # bucket.storage.googleapis.com/{object}
3509 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
3510 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
3511 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
3512 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
3513 # temporary storage.
3514 #
3515 # The supported resource type is:
3516 #
3517 # Google Cloud Storage:
3518 # storage.googleapis.com/{bucket}/{object}
3519 # bucket.storage.googleapis.com/{object}
3520 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
3521 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
3522 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3523 #
3524 # When workers access Google Cloud APIs, they logically do so via
3525 # relative URLs. If this field is specified, it supplies the base
3526 # URL to use for resolving these relative URLs. The normative
3527 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
3528 # Locators&quot;.
3529 #
3530 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
3531 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3532 # console.
3533 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
3534 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
3535 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
3536 #
3537 # When workers access Google Cloud APIs, they logically do so via
3538 # relative URLs. If this field is specified, it supplies the base
3539 # URL to use for resolving these relative URLs. The normative
3540 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
3541 # Locators&quot;.
3542 #
3543 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
3544 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
3545 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
3546 # &quot;dataflow/v1b3/projects&quot;.
3547 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
3548 # &quot;shuffle/v1beta1&quot;.
3549 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
3550 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
3551 # storage.
3552 #
3553 # The supported resource type is:
3554 #
3555 # Google Cloud Storage:
3556 #
3557 # storage.googleapis.com/{bucket}/{object}
3558 # bucket.storage.googleapis.com/{object}
3559 },
3560 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
3561 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
3562 # taskrunner; e.g. &quot;root&quot;.
3563 },
3564 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3565 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
3566 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
3567 },
3568 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
3569 &quot;a_key&quot;: &quot;A String&quot;,
3570 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003571 },
3572 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003573 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
3574 # related tables are stored.
3575 #
3576 # The supported resource type is:
3577 #
3578 # Google BigQuery:
3579 # bigquery.googleapis.com/{dataset}
3580 &quot;internalExperiments&quot;: { # Experimental settings.
3581 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
3582 },
3583 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
3584 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3585 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
3586 # with worker_zone. If neither worker_region nor worker_zone is specified,
3587 # default to the control plane&#x27;s region.
3588 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
3589 # at rest, AKA a Customer Managed Encryption Key (CMEK).
3590 #
3591 # Format:
3592 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
3593 &quot;userAgent&quot;: { # A description of the process that generated the request.
3594 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
3595 },
3596 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
3597 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3598 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
3599 # with worker_region. If neither worker_region nor worker_zone is specified,
3600 # a zone in the control plane&#x27;s region is chosen based on available capacity.
3601 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -07003602 # unspecified, the service will attempt to choose a reasonable
3603 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -07003604 # e.g. &quot;compute.googleapis.com&quot;.
3605 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
3606 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003607 # this resource prefix, where {JOBNAME} is the value of the
3608 # job_name field. The resulting bucket and object prefix is used
3609 # as the prefix of the resources used to store temporary data
3610 # needed during the job execution. NOTE: This will override the
3611 # value in taskrunner_settings.
3612 # The supported resource type is:
3613 #
3614 # Google Cloud Storage:
3615 #
3616 # storage.googleapis.com/{bucket}/{object}
3617 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07003618 &quot;experiments&quot;: [ # The list of experiments to enable.
3619 &quot;A String&quot;,
3620 ],
3621 &quot;version&quot;: { # A structure describing which components and their versions of the service
3622 # are required in order to run the job.
3623 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
3624 },
3625 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003626 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003627 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
3628 # callers cannot mutate it.
3629 { # A message describing the state of a particular execution stage.
3630 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
3631 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
3632 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
3633 },
3634 ],
3635 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3636 # by the metadata values provided here. Populated for ListJobs and all GetJob
3637 # views SUMMARY and higher.
3638 # ListJob response and Job SUMMARY view.
3639 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
3640 { # Metadata for a BigTable connector used by the job.
3641 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
3642 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3643 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
3644 },
3645 ],
3646 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
3647 { # Metadata for a Spanner connector used by the job.
3648 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
3649 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
3650 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3651 },
3652 ],
3653 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
3654 { # Metadata for a Datastore connector used by the job.
3655 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3656 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
3657 },
3658 ],
3659 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
3660 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
3661 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
3662 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
3663 },
3664 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
3665 { # Metadata for a BigQuery connector used by the job.
3666 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
3667 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
3668 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
3669 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
3670 },
3671 ],
3672 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
3673 { # Metadata for a File connector used by the job.
3674 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
3675 },
3676 ],
3677 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
3678 { # Metadata for a PubSub connector used by the job.
3679 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
3680 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
3681 },
3682 ],
3683 },
3684 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
3685 # snapshot.
3686 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
3687 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
3688 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3689 # A description of the user pipeline and stages through which it is executed.
3690 # Created by Cloud Dataflow service. Only retrieved with
3691 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3692 # form. This data is provided by the Dataflow service for ease of visualizing
3693 # the pipeline and interpreting Dataflow provided metrics.
3694 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
3695 { # Description of the composing transforms, names/ids, and input/outputs of a
3696 # stage of execution. Some composing transforms and sources may have been
3697 # generated by the Dataflow service during execution planning.
3698 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
3699 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
3700 { # Description of a transform executed as part of an execution stage.
3701 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
3702 # most closely associated.
3703 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3704 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
3705 },
3706 ],
3707 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
3708 { # Description of an interstitial value between transforms in an execution
3709 # stage.
3710 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3711 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
3712 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3713 # source is most closely associated.
3714 },
3715 ],
3716 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
3717 &quot;outputSource&quot;: [ # Output sources for this stage.
3718 { # Description of an input or output of an execution stage.
3719 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3720 # source is most closely associated.
3721 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3722 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
3723 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
3724 },
3725 ],
3726 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
3727 &quot;inputSource&quot;: [ # Input sources for this stage.
3728 { # Description of an input or output of an execution stage.
3729 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3730 # source is most closely associated.
3731 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3732 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
3733 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
3734 },
3735 ],
3736 },
3737 ],
3738 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
3739 { # Description of the type, names/ids, and input/outputs for a transform.
3740 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
3741 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
3742 &quot;A String&quot;,
3743 ],
3744 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
3745 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
3746 &quot;displayData&quot;: [ # Transform-specific display data.
3747 { # Data provided with a pipeline or transform to provide descriptive info.
3748 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3749 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3750 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3751 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3752 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3753 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3754 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3755 # language namespace (i.e. python module) which defines the display data.
3756 # This allows a dax monitoring system to specially handle the data
3757 # and perform custom rendering.
3758 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3759 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3760 # This is intended to be used as a label for the display data
3761 # when viewed in a dax monitoring system.
3762 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3763 # For example a java_class_name_value of com.mypackage.MyDoFn
3764 # will be stored with MyDoFn as the short_str_value and
3765 # com.mypackage.MyDoFn as the java_class_name value.
3766 # short_str_value can be displayed and java_class_name_value
3767 # will be displayed as a tooltip.
3768 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3769 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3770 },
3771 ],
3772 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
3773 &quot;A String&quot;,
3774 ],
3775 },
3776 ],
3777 &quot;displayData&quot;: [ # Pipeline level display data.
3778 { # Data provided with a pipeline or transform to provide descriptive info.
3779 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3780 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3781 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3782 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3783 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3784 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3785 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3786 # language namespace (i.e. python module) which defines the display data.
3787 # This allows a dax monitoring system to specially handle the data
3788 # and perform custom rendering.
3789 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3790 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3791 # This is intended to be used as a label for the display data
3792 # when viewed in a dax monitoring system.
3793 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3794 # For example a java_class_name_value of com.mypackage.MyDoFn
3795 # will be stored with MyDoFn as the short_str_value and
3796 # com.mypackage.MyDoFn as the java_class_name value.
3797 # short_str_value can be displayed and java_class_name_value
3798 # will be displayed as a tooltip.
3799 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3800 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3801 },
3802 ],
3803 },
3804 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
3805 # of the job it replaced.
3806 #
3807 # When sending a `CreateJobRequest`, you can update a job by specifying it
3808 # here. The job named here is stopped, and its intermediate state is
3809 # transferred to this job.
3810 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003811 # for temporary storage. These temporary files will be
3812 # removed on job completion.
3813 # No duplicates are allowed.
3814 # No file patterns are supported.
3815 #
3816 # The supported files are:
3817 #
3818 # Google Cloud Storage:
3819 #
3820 # storage.googleapis.com/{bucket}/{object}
3821 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07003822 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003823 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003824 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003825 #
3826 # Only one Job with a given name may exist in a project at any
3827 # given time. If a caller attempts to create a Job with the same
3828 # name as an already-existing Job, the attempt returns the
3829 # existing Job.
3830 #
3831 # The name must match the regular expression
3832 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -07003833 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003834 #
3835 # The top-level steps that constitute the entire job.
3836 { # Defines a particular step within a Cloud Dataflow job.
3837 #
3838 # A job consists of multiple steps, each of which performs some
3839 # specific operation as part of the overall job. Data is typically
3840 # passed from one step to another as part of the job.
3841 #
Bu Sun Kim65020912020-05-20 12:08:20 -07003842 # Here&#x27;s an example of a sequence of steps which together implement a
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003843 # Map-Reduce job:
3844 #
3845 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07003846 # collection&#x27;s elements.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003847 #
3848 # * Validate the elements.
3849 #
3850 # * Apply a user-defined function to map each element to some value
3851 # and extract an element-specific key value.
3852 #
3853 # * Group elements with the same key into a single element with
3854 # that key, transforming a multiply-keyed collection into a
3855 # uniquely-keyed collection.
3856 #
3857 # * Write the elements out to some data sink.
3858 #
3859 # Note that the Cloud Dataflow service may be used to run many different
3860 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07003861 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -07003862 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003863 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
3864 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003865 # predefined step has its own required set of properties.
3866 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07003867 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003868 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003869 },
3870 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003871 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
3872 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3873 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3874 # isn&#x27;t contained in the submitted job.
3875 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
3876 &quot;a_key&quot;: { # Contains information about how a particular
3877 # google.dataflow.v1beta3.Step will be executed.
3878 &quot;stepName&quot;: [ # The steps associated with the execution stage.
3879 # Note that stages may have several steps, and that a given step
3880 # might be run by more than one stage.
3881 &quot;A String&quot;,
3882 ],
3883 },
3884 },
3885 },
3886 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003887 #
3888 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3889 # specified.
3890 #
3891 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3892 # terminal state. After a job has reached a terminal state, no
3893 # further state updates may be made.
3894 #
3895 # This field may be mutated by the Cloud Dataflow service;
3896 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -07003897 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
3898 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3899 # contains this job.
3900 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3901 # Flexible resource scheduling jobs are started with some delay after job
3902 # creation, so start_time is unset before start and is updated when the
3903 # job is started by the Cloud Dataflow service. For other jobs, start_time
3904 # always equals to create_time and is immutable and set by the Cloud Dataflow
3905 # service.
3906 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
3907 &quot;labels&quot;: { # User-defined labels for this job.
3908 #
3909 # The labels map can contain no more than 64 entries. Entries of the labels
3910 # map are UTF8 strings that comply with the following restrictions:
3911 #
3912 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3913 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
3914 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
3915 # size.
3916 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003917 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003918 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
3919 # Cloud Dataflow service.
3920 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
3921 #
3922 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3923 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3924 # also be used to directly set a job&#x27;s requested state to
3925 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3926 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003927}
3928
3929 location: string, The [regional endpoint]
3930(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3931contains this job.
3932 x__xgafv: string, V1 error format.
3933 Allowed values
3934 1 - v1 error format
3935 2 - v2 error format
3936
3937Returns:
3938 An object of the form:
3939
3940 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim65020912020-05-20 12:08:20 -07003941 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
3942 # If this field is set, the service will ensure its uniqueness.
3943 # The request to create a job will fail if the service has knowledge of a
3944 # previously submitted job with the same client&#x27;s ID and job name.
3945 # The caller may use this field to ensure idempotence of job
3946 # creation across retried attempts to create a job.
3947 # By default, the field is empty and, in that case, the service ignores it.
3948 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003949 #
3950 # This field is set by the Cloud Dataflow service when the Job is
3951 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003952 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
3953 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003954 # corresponding name prefixes of the new job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003955 &quot;a_key&quot;: &quot;A String&quot;,
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003956 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003957 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
3958 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003959 # options are passed through the service and are used to recreate the
3960 # SDK pipeline options on the worker in a language agnostic and platform
3961 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07003962 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Takashi Matsuo06694102015-09-11 13:55:40 -07003963 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003964 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
3965 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003966 # specified in order for the job to have workers.
3967 { # Describes one particular pool of Cloud Dataflow workers to be
3968 # instantiated by the Cloud Dataflow service in order to perform the
3969 # computations required by a job. Note that a workflow job may use
3970 # multiple pools, in order to match the various computational
3971 # requirements of the various stages of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003972 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
3973 # select a default set of packages which are useful to worker
3974 # harnesses written in a particular language.
3975 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
3976 # the service will use the network &quot;default&quot;.
3977 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
Dan O'Mearadd494642020-05-01 07:42:23 -07003978 # will attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07003979 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
3980 # execute the job. If zero or unspecified, the service will
3981 # attempt to choose a reasonable default.
3982 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
Dan O'Mearadd494642020-05-01 07:42:23 -07003983 # service will choose a number of threads (according to the number of cores
3984 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim65020912020-05-20 12:08:20 -07003985 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
3986 &quot;packages&quot;: [ # Packages to be installed on workers.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003987 { # The packages that must be installed in order for a worker to run the
3988 # steps of the Cloud Dataflow job that will be assigned to its worker
3989 # pool.
3990 #
3991 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3992 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
Bu Sun Kim65020912020-05-20 12:08:20 -07003993 # might use this to install jars containing the user&#x27;s code and all of the
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003994 # various dependencies (libraries, data files, etc.) required in order
3995 # for that code to run.
Bu Sun Kim65020912020-05-20 12:08:20 -07003996 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003997 #
3998 # Google Cloud Storage:
3999 #
4000 # storage.googleapis.com/{bucket}
4001 # bucket.storage.googleapis.com/
Bu Sun Kim65020912020-05-20 12:08:20 -07004002 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004003 },
4004 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07004005 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004006 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
4007 # `TEARDOWN_NEVER`.
4008 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
4009 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
4010 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
4011 # down.
4012 #
4013 # If the workers are not torn down by the service, they will
4014 # continue to run and use Google Compute Engine VM resources in the
Bu Sun Kim65020912020-05-20 12:08:20 -07004015 # user&#x27;s project until they are explicitly terminated by the user.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004016 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
4017 # policy except for small, manually supervised test jobs.
4018 #
4019 # If unknown or unspecified, the service will attempt to choose a reasonable
4020 # default.
Bu Sun Kim65020912020-05-20 12:08:20 -07004021 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
4022 # Compute Engine API.
4023 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
4024 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
4025 },
4026 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
Dan O'Mearadd494642020-05-01 07:42:23 -07004027 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07004028 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
4029 # harness, residing in Google Container Registry.
4030 #
4031 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
4032 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004033 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07004034 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
4035 # service will attempt to choose a reasonable default.
4036 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
4037 # are supported.
4038 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004039 { # Describes the data disk used by a workflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07004040 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004041 # attempt to choose a reasonable default.
Bu Sun Kim65020912020-05-20 12:08:20 -07004042 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004043 # must be a disk type appropriate to the project and zone in which
4044 # the workers will run. If unknown or unspecified, the service
4045 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004046 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004047 # For example, the standard persistent disk type is a resource name
Bu Sun Kim65020912020-05-20 12:08:20 -07004048 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
4049 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004050 # actual valid values are defined the Google Compute Engine API,
4051 # not by the Cloud Dataflow API; consult the Google Compute Engine
4052 # documentation for more information about determining the set of
4053 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004054 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004055 # Google Compute Engine Disk types are local to a particular
4056 # project in a particular zone, and so the resource name will
4057 # typically look something like this:
4058 #
4059 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Bu Sun Kim65020912020-05-20 12:08:20 -07004060 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004061 },
4062 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07004063 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
Dan O'Mearadd494642020-05-01 07:42:23 -07004064 # only be set in the Fn API path. For non-cross-language pipelines this
4065 # should have only one entry. Cross-language pipelines will have two or more
4066 # entries.
4067 { # Defines a SDK harness container for executing Dataflow pipelines.
Bu Sun Kim65020912020-05-20 12:08:20 -07004068 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
4069 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
Dan O'Mearadd494642020-05-01 07:42:23 -07004070 # container instance with this image. If false (or unset) recommends using
4071 # more than one core per SDK container instance with this image for
4072 # efficiency. Note that Dataflow service may choose to override this property
4073 # if needed.
4074 },
4075 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07004076 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
4077 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
4078 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
4079 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
4080 # using the standard Dataflow task runner. Users should ignore
4081 # this field.
4082 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
4083 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
4084 # taskrunner; e.g. &quot;wheel&quot;.
4085 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
4086 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
4087 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
4088 # access the Cloud Dataflow API.
4089 &quot;A String&quot;,
4090 ],
4091 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
4092 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
4093 # will not be uploaded.
4094 #
4095 # The supported resource type is:
4096 #
4097 # Google Cloud Storage:
4098 # storage.googleapis.com/{bucket}/{object}
4099 # bucket.storage.googleapis.com/{object}
4100 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
4101 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
4102 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
4103 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
4104 # temporary storage.
4105 #
4106 # The supported resource type is:
4107 #
4108 # Google Cloud Storage:
4109 # storage.googleapis.com/{bucket}/{object}
4110 # bucket.storage.googleapis.com/{object}
4111 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
4112 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
4113 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
4114 #
4115 # When workers access Google Cloud APIs, they logically do so via
4116 # relative URLs. If this field is specified, it supplies the base
4117 # URL to use for resolving these relative URLs. The normative
4118 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
4119 # Locators&quot;.
4120 #
4121 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
4122 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
4123 # console.
4124 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
4125 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
4126 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
4127 #
4128 # When workers access Google Cloud APIs, they logically do so via
4129 # relative URLs. If this field is specified, it supplies the base
4130 # URL to use for resolving these relative URLs. The normative
4131 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
4132 # Locators&quot;.
4133 #
4134 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
4135 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
4136 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
4137 # &quot;dataflow/v1b3/projects&quot;.
4138 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
4139 # &quot;shuffle/v1beta1&quot;.
4140 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
4141 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
4142 # storage.
4143 #
4144 # The supported resource type is:
4145 #
4146 # Google Cloud Storage:
4147 #
4148 # storage.googleapis.com/{bucket}/{object}
4149 # bucket.storage.googleapis.com/{object}
4150 },
4151 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
4152 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
4153 # taskrunner; e.g. &quot;root&quot;.
4154 },
4155 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
4156 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
4157 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
4158 },
4159 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
4160 &quot;a_key&quot;: &quot;A String&quot;,
4161 },
Takashi Matsuo06694102015-09-11 13:55:40 -07004162 },
4163 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07004164 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
4165 # related tables are stored.
4166 #
4167 # The supported resource type is:
4168 #
4169 # Google BigQuery:
4170 # bigquery.googleapis.com/{dataset}
4171 &quot;internalExperiments&quot;: { # Experimental settings.
4172 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
4173 },
4174 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
4175 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
4176 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
4177 # with worker_zone. If neither worker_region nor worker_zone is specified,
4178 # default to the control plane&#x27;s region.
4179 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
4180 # at rest, AKA a Customer Managed Encryption Key (CMEK).
4181 #
4182 # Format:
4183 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
4184 &quot;userAgent&quot;: { # A description of the process that generated the request.
4185 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
4186 },
4187 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
4188 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
4189 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
4190 # with worker_region. If neither worker_region nor worker_zone is specified,
4191 # a zone in the control plane&#x27;s region is chosen based on available capacity.
4192 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
Dan O'Mearadd494642020-05-01 07:42:23 -07004193 # unspecified, the service will attempt to choose a reasonable
4194 # default. This should be in the form of the API service name,
Bu Sun Kim65020912020-05-20 12:08:20 -07004195 # e.g. &quot;compute.googleapis.com&quot;.
4196 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
4197 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004198 # this resource prefix, where {JOBNAME} is the value of the
4199 # job_name field. The resulting bucket and object prefix is used
4200 # as the prefix of the resources used to store temporary data
4201 # needed during the job execution. NOTE: This will override the
4202 # value in taskrunner_settings.
4203 # The supported resource type is:
4204 #
4205 # Google Cloud Storage:
4206 #
4207 # storage.googleapis.com/{bucket}/{object}
4208 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07004209 &quot;experiments&quot;: [ # The list of experiments to enable.
4210 &quot;A String&quot;,
4211 ],
4212 &quot;version&quot;: { # A structure describing which components and their versions of the service
4213 # are required in order to run the job.
4214 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
4215 },
4216 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
Takashi Matsuo06694102015-09-11 13:55:40 -07004217 },
Bu Sun Kim65020912020-05-20 12:08:20 -07004218 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
4219 # callers cannot mutate it.
4220 { # A message describing the state of a particular execution stage.
4221 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
4222 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
4223 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
4224 },
4225 ],
4226 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
4227 # by the metadata values provided here. Populated for ListJobs and all GetJob
4228 # views SUMMARY and higher.
4229 # ListJob response and Job SUMMARY view.
4230 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
4231 { # Metadata for a BigTable connector used by the job.
4232 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
4233 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
4234 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
4235 },
4236 ],
4237 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
4238 { # Metadata for a Spanner connector used by the job.
4239 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
4240 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
4241 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
4242 },
4243 ],
4244 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
4245 { # Metadata for a Datastore connector used by the job.
4246 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
4247 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
4248 },
4249 ],
4250 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
4251 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
4252 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
4253 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
4254 },
4255 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
4256 { # Metadata for a BigQuery connector used by the job.
4257 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
4258 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
4259 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
4260 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
4261 },
4262 ],
4263 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
4264 { # Metadata for a File connector used by the job.
4265 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
4266 },
4267 ],
4268 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
4269 { # Metadata for a PubSub connector used by the job.
4270 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
4271 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
4272 },
4273 ],
4274 },
4275 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
4276 # snapshot.
4277 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
4278 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
4279 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
4280 # A description of the user pipeline and stages through which it is executed.
4281 # Created by Cloud Dataflow service. Only retrieved with
4282 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
4283 # form. This data is provided by the Dataflow service for ease of visualizing
4284 # the pipeline and interpreting Dataflow provided metrics.
4285 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
4286 { # Description of the composing transforms, names/ids, and input/outputs of a
4287 # stage of execution. Some composing transforms and sources may have been
4288 # generated by the Dataflow service during execution planning.
4289 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
4290 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
4291 { # Description of a transform executed as part of an execution stage.
4292 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
4293 # most closely associated.
4294 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4295 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
4296 },
4297 ],
4298 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
4299 { # Description of an interstitial value between transforms in an execution
4300 # stage.
4301 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4302 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
4303 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
4304 # source is most closely associated.
4305 },
4306 ],
4307 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
4308 &quot;outputSource&quot;: [ # Output sources for this stage.
4309 { # Description of an input or output of an execution stage.
4310 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
4311 # source is most closely associated.
4312 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4313 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
4314 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
4315 },
4316 ],
4317 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
4318 &quot;inputSource&quot;: [ # Input sources for this stage.
4319 { # Description of an input or output of an execution stage.
4320 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
4321 # source is most closely associated.
4322 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4323 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
4324 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
4325 },
4326 ],
4327 },
4328 ],
4329 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
4330 { # Description of the type, names/ids, and input/outputs for a transform.
4331 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
4332 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
4333 &quot;A String&quot;,
4334 ],
4335 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
4336 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
4337 &quot;displayData&quot;: [ # Transform-specific display data.
4338 { # Data provided with a pipeline or transform to provide descriptive info.
4339 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
4340 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
4341 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
4342 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
4343 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
4344 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
4345 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
4346 # language namespace (i.e. python module) which defines the display data.
4347 # This allows a dax monitoring system to specially handle the data
4348 # and perform custom rendering.
4349 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
4350 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
4351 # This is intended to be used as a label for the display data
4352 # when viewed in a dax monitoring system.
4353 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
4354 # For example a java_class_name_value of com.mypackage.MyDoFn
4355 # will be stored with MyDoFn as the short_str_value and
4356 # com.mypackage.MyDoFn as the java_class_name value.
4357 # short_str_value can be displayed and java_class_name_value
4358 # will be displayed as a tooltip.
4359 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
4360 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
4361 },
4362 ],
4363 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
4364 &quot;A String&quot;,
4365 ],
4366 },
4367 ],
4368 &quot;displayData&quot;: [ # Pipeline level display data.
4369 { # Data provided with a pipeline or transform to provide descriptive info.
4370 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
4371 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
4372 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
4373 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
4374 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
4375 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
4376 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
4377 # language namespace (i.e. python module) which defines the display data.
4378 # This allows a dax monitoring system to specially handle the data
4379 # and perform custom rendering.
4380 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
4381 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
4382 # This is intended to be used as a label for the display data
4383 # when viewed in a dax monitoring system.
4384 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
4385 # For example a java_class_name_value of com.mypackage.MyDoFn
4386 # will be stored with MyDoFn as the short_str_value and
4387 # com.mypackage.MyDoFn as the java_class_name value.
4388 # short_str_value can be displayed and java_class_name_value
4389 # will be displayed as a tooltip.
4390 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
4391 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
4392 },
4393 ],
4394 },
4395 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
4396 # of the job it replaced.
4397 #
4398 # When sending a `CreateJobRequest`, you can update a job by specifying it
4399 # here. The job named here is stopped, and its intermediate state is
4400 # transferred to this job.
4401 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004402 # for temporary storage. These temporary files will be
4403 # removed on job completion.
4404 # No duplicates are allowed.
4405 # No file patterns are supported.
4406 #
4407 # The supported files are:
4408 #
4409 # Google Cloud Storage:
4410 #
4411 # storage.googleapis.com/{bucket}/{object}
4412 # bucket.storage.googleapis.com/{object}
Bu Sun Kim65020912020-05-20 12:08:20 -07004413 &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004414 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07004415 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004416 #
4417 # Only one Job with a given name may exist in a project at any
4418 # given time. If a caller attempts to create a Job with the same
4419 # name as an already-existing Job, the attempt returns the
4420 # existing Job.
4421 #
4422 # The name must match the regular expression
4423 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
Bu Sun Kim65020912020-05-20 12:08:20 -07004424 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004425 #
4426 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004427 { # Defines a particular step within a Cloud Dataflow job.
4428 #
4429 # A job consists of multiple steps, each of which performs some
4430 # specific operation as part of the overall job. Data is typically
4431 # passed from one step to another as part of the job.
4432 #
Bu Sun Kim65020912020-05-20 12:08:20 -07004433 # Here&#x27;s an example of a sequence of steps which together implement a
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004434 # Map-Reduce job:
4435 #
4436 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07004437 # collection&#x27;s elements.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004438 #
4439 # * Validate the elements.
4440 #
4441 # * Apply a user-defined function to map each element to some value
4442 # and extract an element-specific key value.
4443 #
4444 # * Group elements with the same key into a single element with
4445 # that key, transforming a multiply-keyed collection into a
4446 # uniquely-keyed collection.
4447 #
4448 # * Write the elements out to some data sink.
4449 #
4450 # Note that the Cloud Dataflow service may be used to run many different
4451 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07004452 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
Dan O'Mearadd494642020-05-01 07:42:23 -07004453 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07004454 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
4455 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004456 # predefined step has its own required set of properties.
4457 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07004458 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Takashi Matsuo06694102015-09-11 13:55:40 -07004459 },
4460 },
4461 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07004462 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
4463 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
4464 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
4465 # isn&#x27;t contained in the submitted job.
4466 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
4467 &quot;a_key&quot;: { # Contains information about how a particular
4468 # google.dataflow.v1beta3.Step will be executed.
4469 &quot;stepName&quot;: [ # The steps associated with the execution stage.
4470 # Note that stages may have several steps, and that a given step
4471 # might be run by more than one stage.
4472 &quot;A String&quot;,
4473 ],
4474 },
4475 },
4476 },
4477 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004478 #
4479 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
4480 # specified.
4481 #
4482 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
4483 # terminal state. After a job has reached a terminal state, no
4484 # further state updates may be made.
4485 #
4486 # This field may be mutated by the Cloud Dataflow service;
4487 # callers cannot mutate it.
Bu Sun Kim65020912020-05-20 12:08:20 -07004488 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
4489 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
4490 # contains this job.
4491 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
4492 # Flexible resource scheduling jobs are started with some delay after job
4493 # creation, so start_time is unset before start and is updated when the
4494 # job is started by the Cloud Dataflow service. For other jobs, start_time
4495 # always equals to create_time and is immutable and set by the Cloud Dataflow
4496 # service.
4497 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
4498 &quot;labels&quot;: { # User-defined labels for this job.
4499 #
4500 # The labels map can contain no more than 64 entries. Entries of the labels
4501 # map are UTF8 strings that comply with the following restrictions:
4502 #
4503 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
4504 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
4505 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
4506 # size.
4507 &quot;a_key&quot;: &quot;A String&quot;,
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004508 },
Bu Sun Kim65020912020-05-20 12:08:20 -07004509 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
4510 # Cloud Dataflow service.
4511 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
4512 #
4513 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
4514 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
4515 # also be used to directly set a job&#x27;s requested state to
4516 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
4517 # job if it has not already reached a terminal state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004518 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004519</div>
4520
4521</body></html>