blob: df7d7bd859776b26c8b4cd9d17307da74513c446 [file] [log] [blame]
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.jobs.html">jobs</a></h1>
Nathaniel Manista4f877e52015-06-15 16:44:50 +000076<h2>Instance Methods</h2>
77<p class="toc_element">
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -070078 <code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>
79</p>
80<p class="firstline">Returns the debug Resource.</p>
81
82<p class="toc_element">
Nathaniel Manista4f877e52015-06-15 16:44:50 +000083 <code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>
84</p>
85<p class="firstline">Returns the messages Resource.</p>
86
87<p class="toc_element">
88 <code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>
89</p>
90<p class="firstline">Returns the workItems Resource.</p>
91
92<p class="toc_element">
Bu Sun Kimd059ad82020-07-22 17:02:09 -070093 <code><a href="#aggregated">aggregated(projectId, pageSize=None, filter=None, location=None, pageToken=None, view=None, x__xgafv=None)</a></code></p>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070094<p class="firstline">List the jobs of a project across all regions.</p>
95<p class="toc_element">
96 <code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>
97<p class="firstline">Retrieves the next page of results.</p>
98<p class="toc_element">
Bu Sun Kim65020912020-05-20 12:08:20 -070099 <code><a href="#create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400100<p class="firstline">Creates a Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000101<p class="toc_element">
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700102 <code><a href="#get">get(projectId, jobId, location=None, view=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400103<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000104<p class="toc_element">
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700105 <code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000106<p class="firstline">Request the job status.</p>
107<p class="toc_element">
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700108 <code><a href="#list">list(projectId, filter=None, pageSize=None, location=None, view=None, pageToken=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400109<p class="firstline">List the jobs of a project.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000110<p class="toc_element">
111 <code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
112<p class="firstline">Retrieves the next page of results.</p>
113<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700114 <code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700115<p class="firstline">Snapshot the state of a streaming job.</p>
116<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700117 <code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400118<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000119<h3>Method Details</h3>
120<div class="method">
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700121 <code class="details" id="aggregated">aggregated(projectId, pageSize=None, filter=None, location=None, pageToken=None, view=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700122 <pre>List the jobs of a project across all regions.
123
124Args:
125 projectId: string, The project which owns the jobs. (required)
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700126 pageSize: integer, If there are many jobs, limit response to at most this many.
127The actual number of jobs returned will be the lesser of max_responses
128and an unspecified server-defined limit.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700129 filter: string, The kind of filter to use.
130 location: string, The [regional endpoint]
131(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
132contains this job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700133 pageToken: string, Set this to the &#x27;next_page_token&#x27; field of a previous response
134to request additional results in a long list.
Bu Sun Kim65020912020-05-20 12:08:20 -0700135 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700136 x__xgafv: string, V1 error format.
137 Allowed values
138 1 - v1 error format
139 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700140
141Returns:
142 An object of the form:
143
Dan O'Mearadd494642020-05-01 07:42:23 -0700144 { # Response to a request to list Cloud Dataflow jobs in a project. This might
145 # be a partial response, depending on the page size in the ListJobsRequest.
146 # However, if the project does not have any jobs, an instance of
Bu Sun Kim65020912020-05-20 12:08:20 -0700147 # ListJobsResponse is not returned and the requests&#x27;s response
Dan O'Mearadd494642020-05-01 07:42:23 -0700148 # body is empty {}.
Bu Sun Kim65020912020-05-20 12:08:20 -0700149 &quot;jobs&quot;: [ # A subset of the requested job information.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700150 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700151 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
152 # A description of the user pipeline and stages through which it is executed.
153 # Created by Cloud Dataflow service. Only retrieved with
154 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
155 # form. This data is provided by the Dataflow service for ease of visualizing
156 # the pipeline and interpreting Dataflow provided metrics.
157 &quot;displayData&quot;: [ # Pipeline level display data.
158 { # Data provided with a pipeline or transform to provide descriptive info.
159 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
160 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
161 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
162 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
163 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
164 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
165 # This is intended to be used as a label for the display data
166 # when viewed in a dax monitoring system.
167 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
168 # language namespace (i.e. python module) which defines the display data.
169 # This allows a dax monitoring system to specially handle the data
170 # and perform custom rendering.
171 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
172 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
173 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
174 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
175 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
176 # For example a java_class_name_value of com.mypackage.MyDoFn
177 # will be stored with MyDoFn as the short_str_value and
178 # com.mypackage.MyDoFn as the java_class_name value.
179 # short_str_value can be displayed and java_class_name_value
180 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700181 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700182 ],
183 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
184 { # Description of the type, names/ids, and input/outputs for a transform.
185 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700186 &quot;A String&quot;,
187 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700188 &quot;displayData&quot;: [ # Transform-specific display data.
189 { # Data provided with a pipeline or transform to provide descriptive info.
190 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
191 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
192 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
193 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
194 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
195 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
196 # This is intended to be used as a label for the display data
197 # when viewed in a dax monitoring system.
198 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
199 # language namespace (i.e. python module) which defines the display data.
200 # This allows a dax monitoring system to specially handle the data
201 # and perform custom rendering.
202 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
203 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
204 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
205 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
206 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
207 # For example a java_class_name_value of com.mypackage.MyDoFn
208 # will be stored with MyDoFn as the short_str_value and
209 # com.mypackage.MyDoFn as the java_class_name value.
210 # short_str_value can be displayed and java_class_name_value
211 # will be displayed as a tooltip.
212 },
213 ],
214 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
215 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
216 &quot;A String&quot;,
217 ],
218 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
219 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
220 },
221 ],
222 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
223 { # Description of the composing transforms, names/ids, and input/outputs of a
224 # stage of execution. Some composing transforms and sources may have been
225 # generated by the Dataflow service during execution planning.
226 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
227 { # Description of an interstitial value between transforms in an execution
228 # stage.
229 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
230 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
231 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
232 # source is most closely associated.
233 },
234 ],
235 &quot;inputSource&quot;: [ # Input sources for this stage.
236 { # Description of an input or output of an execution stage.
237 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
238 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
239 # source is most closely associated.
240 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
241 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
242 },
243 ],
244 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
245 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
246 { # Description of a transform executed as part of an execution stage.
247 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
248 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
249 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
250 # most closely associated.
251 },
252 ],
253 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
254 &quot;outputSource&quot;: [ # Output sources for this stage.
255 { # Description of an input or output of an execution stage.
256 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
257 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
258 # source is most closely associated.
259 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
260 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
261 },
262 ],
263 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
264 },
265 ],
266 },
267 &quot;labels&quot;: { # User-defined labels for this job.
268 #
269 # The labels map can contain no more than 64 entries. Entries of the labels
270 # map are UTF8 strings that comply with the following restrictions:
271 #
272 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
273 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
274 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
275 # size.
276 &quot;a_key&quot;: &quot;A String&quot;,
277 },
278 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
279 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
280 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
281 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
282 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
283 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
284 # with worker_zone. If neither worker_region nor worker_zone is specified,
285 # default to the control plane&#x27;s region.
286 &quot;userAgent&quot;: { # A description of the process that generated the request.
287 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
288 },
289 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
290 &quot;version&quot;: { # A structure describing which components and their versions of the service
291 # are required in order to run the job.
292 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
293 },
294 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
295 # at rest, AKA a Customer Managed Encryption Key (CMEK).
296 #
297 # Format:
298 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
299 &quot;experiments&quot;: [ # The list of experiments to enable.
300 &quot;A String&quot;,
301 ],
302 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
303 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
304 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
305 # with worker_region. If neither worker_region nor worker_zone is specified,
306 # a zone in the control plane&#x27;s region is chosen based on available capacity.
307 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
308 # specified in order for the job to have workers.
309 { # Describes one particular pool of Cloud Dataflow workers to be
310 # instantiated by the Cloud Dataflow service in order to perform the
311 # computations required by a job. Note that a workflow job may use
312 # multiple pools, in order to match the various computational
313 # requirements of the various stages of the job.
314 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
315 # Compute Engine API.
316 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
317 # only be set in the Fn API path. For non-cross-language pipelines this
318 # should have only one entry. Cross-language pipelines will have two or more
319 # entries.
320 { # Defines a SDK harness container for executing Dataflow pipelines.
321 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
322 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
323 # container instance with this image. If false (or unset) recommends using
324 # more than one core per SDK container instance with this image for
325 # efficiency. Note that Dataflow service may choose to override this property
326 # if needed.
327 },
328 ],
329 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
330 # will attempt to choose a reasonable default.
331 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
332 # are supported.
333 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
334 &quot;a_key&quot;: &quot;A String&quot;,
335 },
336 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
337 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
338 { # Describes the data disk used by a workflow job.
339 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
340 # attempt to choose a reasonable default.
341 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
342 # must be a disk type appropriate to the project and zone in which
343 # the workers will run. If unknown or unspecified, the service
344 # will attempt to choose a reasonable default.
345 #
346 # For example, the standard persistent disk type is a resource name
347 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
348 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
349 # actual valid values are defined the Google Compute Engine API,
350 # not by the Cloud Dataflow API; consult the Google Compute Engine
351 # documentation for more information about determining the set of
352 # available disk types for a particular project and zone.
353 #
354 # Google Compute Engine Disk types are local to a particular
355 # project in a particular zone, and so the resource name will
356 # typically look something like this:
357 #
358 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
359 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
360 },
361 ],
362 &quot;packages&quot;: [ # Packages to be installed on workers.
363 { # The packages that must be installed in order for a worker to run the
364 # steps of the Cloud Dataflow job that will be assigned to its worker
365 # pool.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700366 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700367 # This is the mechanism by which the Cloud Dataflow SDK causes code to
368 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
369 # might use this to install jars containing the user&#x27;s code and all of the
370 # various dependencies (libraries, data files, etc.) required in order
371 # for that code to run.
372 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
373 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
374 #
375 # Google Cloud Storage:
376 #
377 # storage.googleapis.com/{bucket}
378 # bucket.storage.googleapis.com/
379 },
380 ],
381 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
382 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
383 # `TEARDOWN_NEVER`.
384 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
385 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
386 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
387 # down.
388 #
389 # If the workers are not torn down by the service, they will
390 # continue to run and use Google Compute Engine VM resources in the
391 # user&#x27;s project until they are explicitly terminated by the user.
392 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
393 # policy except for small, manually supervised test jobs.
394 #
395 # If unknown or unspecified, the service will attempt to choose a reasonable
396 # default.
397 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
398 # the service will use the network &quot;default&quot;.
399 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
400 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
401 # attempt to choose a reasonable default.
402 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
403 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
404 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
405 },
406 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
407 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
408 },
409 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
410 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
411 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
412 # execute the job. If zero or unspecified, the service will
413 # attempt to choose a reasonable default.
414 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
415 # service will choose a number of threads (according to the number of cores
416 # on the selected machine type for batch, or 1 by convention for streaming).
417 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
418 # harness, residing in Google Container Registry.
419 #
420 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
421 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
422 # using the standard Dataflow task runner. Users should ignore
423 # this field.
424 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
425 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
426 # access the Cloud Dataflow API.
427 &quot;A String&quot;,
428 ],
429 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700430 #
431 # When workers access Google Cloud APIs, they logically do so via
432 # relative URLs. If this field is specified, it supplies the base
433 # URL to use for resolving these relative URLs. The normative
434 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
435 # Locators&quot;.
436 #
437 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700438 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
439 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
440 # console.
441 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
442 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
443 # taskrunner; e.g. &quot;root&quot;.
444 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
445 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
446 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
447 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
448 # &quot;shuffle/v1beta1&quot;.
449 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
450 # storage.
451 #
452 # The supported resource type is:
453 #
454 # Google Cloud Storage:
455 #
456 # storage.googleapis.com/{bucket}/{object}
457 # bucket.storage.googleapis.com/{object}
458 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
459 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
460 # &quot;dataflow/v1b3/projects&quot;.
461 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
462 #
463 # When workers access Google Cloud APIs, they logically do so via
464 # relative URLs. If this field is specified, it supplies the base
465 # URL to use for resolving these relative URLs. The normative
466 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
467 # Locators&quot;.
468 #
469 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
470 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
471 },
472 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
473 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
474 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
475 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
476 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
477 # taskrunner; e.g. &quot;wheel&quot;.
478 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
479 # will not be uploaded.
480 #
481 # The supported resource type is:
482 #
483 # Google Cloud Storage:
484 # storage.googleapis.com/{bucket}/{object}
485 # bucket.storage.googleapis.com/{object}
486 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
487 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
488 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
489 # temporary storage.
490 #
491 # The supported resource type is:
492 #
493 # Google Cloud Storage:
494 # storage.googleapis.com/{bucket}/{object}
495 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700496 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700497 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
498 # attempt to choose a reasonable default.
499 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
500 # select a default set of packages which are useful to worker
501 # harnesses written in a particular language.
502 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
503 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700504 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700505 ],
506 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
507 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
508 # this resource prefix, where {JOBNAME} is the value of the
509 # job_name field. The resulting bucket and object prefix is used
510 # as the prefix of the resources used to store temporary data
511 # needed during the job execution. NOTE: This will override the
512 # value in taskrunner_settings.
513 # The supported resource type is:
514 #
515 # Google Cloud Storage:
516 #
517 # storage.googleapis.com/{bucket}/{object}
518 # bucket.storage.googleapis.com/{object}
519 &quot;internalExperiments&quot;: { # Experimental settings.
520 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700521 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700522 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
523 # options are passed through the service and are used to recreate the
524 # SDK pipeline options on the worker in a language agnostic and platform
525 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -0700526 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700527 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700528 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
529 # related tables are stored.
530 #
531 # The supported resource type is:
532 #
533 # Google BigQuery:
534 # bigquery.googleapis.com/{dataset}
535 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
536 # unspecified, the service will attempt to choose a reasonable
537 # default. This should be in the form of the API service name,
538 # e.g. &quot;compute.googleapis.com&quot;.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700539 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700540 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
541 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
542 #
543 # The top-level steps that constitute the entire job.
544 { # Defines a particular step within a Cloud Dataflow job.
545 #
546 # A job consists of multiple steps, each of which performs some
547 # specific operation as part of the overall job. Data is typically
548 # passed from one step to another as part of the job.
549 #
550 # Here&#x27;s an example of a sequence of steps which together implement a
551 # Map-Reduce job:
552 #
553 # * Read a collection of data from some source, parsing the
554 # collection&#x27;s elements.
555 #
556 # * Validate the elements.
557 #
558 # * Apply a user-defined function to map each element to some value
559 # and extract an element-specific key value.
560 #
561 # * Group elements with the same key into a single element with
562 # that key, transforming a multiply-keyed collection into a
563 # uniquely-keyed collection.
564 #
565 # * Write the elements out to some data sink.
566 #
567 # Note that the Cloud Dataflow service may be used to run many different
568 # types of jobs, not just Map-Reduce.
569 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
570 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
571 # predefined step has its own required set of properties.
572 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
573 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
574 },
575 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
576 # step with respect to all other steps in the Cloud Dataflow job.
577 },
578 ],
579 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
580 # callers cannot mutate it.
581 { # A message describing the state of a particular execution stage.
582 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
583 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
584 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
585 },
586 ],
587 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
588 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
589 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
590 # by the metadata values provided here. Populated for ListJobs and all GetJob
591 # views SUMMARY and higher.
592 # ListJob response and Job SUMMARY view.
593 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
594 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
595 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
596 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
597 },
598 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
599 { # Metadata for a BigTable connector used by the job.
600 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
601 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
602 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
603 },
604 ],
605 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
606 { # Metadata for a PubSub connector used by the job.
607 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
608 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
609 },
610 ],
611 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
612 { # Metadata for a BigQuery connector used by the job.
613 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
614 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
615 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
616 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
617 },
618 ],
619 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
620 { # Metadata for a File connector used by the job.
621 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
622 },
623 ],
624 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
625 { # Metadata for a Datastore connector used by the job.
626 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
627 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
628 },
629 ],
630 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
631 { # Metadata for a Spanner connector used by the job.
632 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
633 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
634 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
635 },
636 ],
637 },
638 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
639 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
640 # contains this job.
641 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
642 # corresponding name prefixes of the new job.
643 &quot;a_key&quot;: &quot;A String&quot;,
644 },
645 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
646 # Flexible resource scheduling jobs are started with some delay after job
647 # creation, so start_time is unset before start and is updated when the
648 # job is started by the Cloud Dataflow service. For other jobs, start_time
649 # always equals to create_time and is immutable and set by the Cloud Dataflow
650 # service.
651 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
652 # If this field is set, the service will ensure its uniqueness.
653 # The request to create a job will fail if the service has knowledge of a
654 # previously submitted job with the same client&#x27;s ID and job name.
655 # The caller may use this field to ensure idempotence of job
656 # creation across retried attempts to create a job.
657 # By default, the field is empty and, in that case, the service ignores it.
658 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
659 # isn&#x27;t contained in the submitted job.
660 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
661 &quot;a_key&quot;: { # Contains information about how a particular
662 # google.dataflow.v1beta3.Step will be executed.
663 &quot;stepName&quot;: [ # The steps associated with the execution stage.
664 # Note that stages may have several steps, and that a given step
665 # might be run by more than one stage.
666 &quot;A String&quot;,
667 ],
668 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700669 },
670 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700671 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
672 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
673 # Cloud Dataflow service.
674 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
675 # for temporary storage. These temporary files will be
676 # removed on job completion.
677 # No duplicates are allowed.
678 # No file patterns are supported.
679 #
680 # The supported files are:
681 #
682 # Google Cloud Storage:
683 #
684 # storage.googleapis.com/{bucket}/{object}
685 # bucket.storage.googleapis.com/{object}
686 &quot;A String&quot;,
687 ],
688 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
689 #
690 # This field is set by the Cloud Dataflow service when the Job is
691 # created, and is immutable for the life of the job.
692 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
693 #
694 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
695 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
696 # also be used to directly set a job&#x27;s requested state to
697 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
698 # job if it has not already reached a terminal state.
699 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
700 # of the job it replaced.
701 #
702 # When sending a `CreateJobRequest`, you can update a job by specifying it
703 # here. The job named here is stopped, and its intermediate state is
704 # transferred to this job.
705 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
706 # snapshot.
707 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
708 #
709 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
710 # specified.
711 #
712 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
713 # terminal state. After a job has reached a terminal state, no
714 # further state updates may be made.
715 #
716 # This field may be mutated by the Cloud Dataflow service;
717 # callers cannot mutate it.
718 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
719 #
720 # Only one Job with a given name may exist in a project at any
721 # given time. If a caller attempts to create a Job with the same
722 # name as an already-existing Job, the attempt returns the
723 # existing Job.
724 #
725 # The name must match the regular expression
726 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
727 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
Bu Sun Kim65020912020-05-20 12:08:20 -0700728 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700729 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700730 &quot;nextPageToken&quot;: &quot;A String&quot;, # Set if there may be more results than fit in this response.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700731 &quot;failedLocation&quot;: [ # Zero or more messages describing the [regional endpoints]
732 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
733 # failed to respond.
734 { # Indicates which [regional endpoint]
735 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
736 # to respond to a request for data.
737 &quot;name&quot;: &quot;A String&quot;, # The name of the [regional endpoint]
738 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
739 # failed to respond.
740 },
741 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700742 }</pre>
743</div>
744
745<div class="method">
746 <code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>
747 <pre>Retrieves the next page of results.
748
749Args:
750 previous_request: The request for the previous page. (required)
751 previous_response: The response from the request for the previous page. (required)
752
753Returns:
Bu Sun Kim65020912020-05-20 12:08:20 -0700754 A request object that you can call &#x27;execute()&#x27; on to request the next
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700755 page. Returns None if there are no more items in the collection.
756 </pre>
757</div>
758
759<div class="method">
Bu Sun Kim65020912020-05-20 12:08:20 -0700760 <code class="details" id="create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400761 <pre>Creates a Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000762
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700763To create a job, we recommend using `projects.locations.jobs.create` with a
764[regional endpoint]
765(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
766`projects.jobs.create` is not recommended, as your job will always start
767in `us-central1`.
768
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000769Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400770 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -0700771 body: object, The request body.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000772 The object takes the form of:
773
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400774{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700775 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
776 # A description of the user pipeline and stages through which it is executed.
777 # Created by Cloud Dataflow service. Only retrieved with
778 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
779 # form. This data is provided by the Dataflow service for ease of visualizing
780 # the pipeline and interpreting Dataflow provided metrics.
781 &quot;displayData&quot;: [ # Pipeline level display data.
782 { # Data provided with a pipeline or transform to provide descriptive info.
783 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
784 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
785 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
786 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
787 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
788 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
789 # This is intended to be used as a label for the display data
790 # when viewed in a dax monitoring system.
791 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
792 # language namespace (i.e. python module) which defines the display data.
793 # This allows a dax monitoring system to specially handle the data
794 # and perform custom rendering.
795 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
796 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
797 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
798 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
799 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
800 # For example a java_class_name_value of com.mypackage.MyDoFn
801 # will be stored with MyDoFn as the short_str_value and
802 # com.mypackage.MyDoFn as the java_class_name value.
803 # short_str_value can be displayed and java_class_name_value
804 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700805 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700806 ],
807 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
808 { # Description of the type, names/ids, and input/outputs for a transform.
809 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700810 &quot;A String&quot;,
811 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700812 &quot;displayData&quot;: [ # Transform-specific display data.
813 { # Data provided with a pipeline or transform to provide descriptive info.
814 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
815 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
816 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
817 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
818 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
819 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
820 # This is intended to be used as a label for the display data
821 # when viewed in a dax monitoring system.
822 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
823 # language namespace (i.e. python module) which defines the display data.
824 # This allows a dax monitoring system to specially handle the data
825 # and perform custom rendering.
826 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
827 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
828 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
829 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
830 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
831 # For example a java_class_name_value of com.mypackage.MyDoFn
832 # will be stored with MyDoFn as the short_str_value and
833 # com.mypackage.MyDoFn as the java_class_name value.
834 # short_str_value can be displayed and java_class_name_value
835 # will be displayed as a tooltip.
836 },
837 ],
838 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
839 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
840 &quot;A String&quot;,
841 ],
842 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
843 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
844 },
845 ],
846 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
847 { # Description of the composing transforms, names/ids, and input/outputs of a
848 # stage of execution. Some composing transforms and sources may have been
849 # generated by the Dataflow service during execution planning.
850 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
851 { # Description of an interstitial value between transforms in an execution
852 # stage.
853 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
854 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
855 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
856 # source is most closely associated.
857 },
858 ],
859 &quot;inputSource&quot;: [ # Input sources for this stage.
860 { # Description of an input or output of an execution stage.
861 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
862 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
863 # source is most closely associated.
864 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
865 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
866 },
867 ],
868 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
869 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
870 { # Description of a transform executed as part of an execution stage.
871 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
872 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
873 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
874 # most closely associated.
875 },
876 ],
877 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
878 &quot;outputSource&quot;: [ # Output sources for this stage.
879 { # Description of an input or output of an execution stage.
880 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
881 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
882 # source is most closely associated.
883 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
884 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
885 },
886 ],
887 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
888 },
889 ],
890 },
891 &quot;labels&quot;: { # User-defined labels for this job.
892 #
893 # The labels map can contain no more than 64 entries. Entries of the labels
894 # map are UTF8 strings that comply with the following restrictions:
895 #
896 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
897 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
898 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
899 # size.
900 &quot;a_key&quot;: &quot;A String&quot;,
901 },
902 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
903 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
904 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
905 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
906 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
907 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
908 # with worker_zone. If neither worker_region nor worker_zone is specified,
909 # default to the control plane&#x27;s region.
910 &quot;userAgent&quot;: { # A description of the process that generated the request.
911 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
912 },
913 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
914 &quot;version&quot;: { # A structure describing which components and their versions of the service
915 # are required in order to run the job.
916 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
917 },
918 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
919 # at rest, AKA a Customer Managed Encryption Key (CMEK).
920 #
921 # Format:
922 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
923 &quot;experiments&quot;: [ # The list of experiments to enable.
924 &quot;A String&quot;,
925 ],
926 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
927 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
928 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
929 # with worker_region. If neither worker_region nor worker_zone is specified,
930 # a zone in the control plane&#x27;s region is chosen based on available capacity.
931 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
932 # specified in order for the job to have workers.
933 { # Describes one particular pool of Cloud Dataflow workers to be
934 # instantiated by the Cloud Dataflow service in order to perform the
935 # computations required by a job. Note that a workflow job may use
936 # multiple pools, in order to match the various computational
937 # requirements of the various stages of the job.
938 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
939 # Compute Engine API.
940 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
941 # only be set in the Fn API path. For non-cross-language pipelines this
942 # should have only one entry. Cross-language pipelines will have two or more
943 # entries.
944 { # Defines a SDK harness container for executing Dataflow pipelines.
945 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
946 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
947 # container instance with this image. If false (or unset) recommends using
948 # more than one core per SDK container instance with this image for
949 # efficiency. Note that Dataflow service may choose to override this property
950 # if needed.
951 },
952 ],
953 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
954 # will attempt to choose a reasonable default.
955 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
956 # are supported.
957 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
958 &quot;a_key&quot;: &quot;A String&quot;,
959 },
960 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
961 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
962 { # Describes the data disk used by a workflow job.
963 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
964 # attempt to choose a reasonable default.
965 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
966 # must be a disk type appropriate to the project and zone in which
967 # the workers will run. If unknown or unspecified, the service
968 # will attempt to choose a reasonable default.
969 #
970 # For example, the standard persistent disk type is a resource name
971 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
972 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
973 # actual valid values are defined the Google Compute Engine API,
974 # not by the Cloud Dataflow API; consult the Google Compute Engine
975 # documentation for more information about determining the set of
976 # available disk types for a particular project and zone.
977 #
978 # Google Compute Engine Disk types are local to a particular
979 # project in a particular zone, and so the resource name will
980 # typically look something like this:
981 #
982 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
983 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
984 },
985 ],
986 &quot;packages&quot;: [ # Packages to be installed on workers.
987 { # The packages that must be installed in order for a worker to run the
988 # steps of the Cloud Dataflow job that will be assigned to its worker
989 # pool.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700990 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700991 # This is the mechanism by which the Cloud Dataflow SDK causes code to
992 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
993 # might use this to install jars containing the user&#x27;s code and all of the
994 # various dependencies (libraries, data files, etc.) required in order
995 # for that code to run.
996 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
997 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
998 #
999 # Google Cloud Storage:
1000 #
1001 # storage.googleapis.com/{bucket}
1002 # bucket.storage.googleapis.com/
1003 },
1004 ],
1005 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
1006 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1007 # `TEARDOWN_NEVER`.
1008 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1009 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1010 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1011 # down.
1012 #
1013 # If the workers are not torn down by the service, they will
1014 # continue to run and use Google Compute Engine VM resources in the
1015 # user&#x27;s project until they are explicitly terminated by the user.
1016 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1017 # policy except for small, manually supervised test jobs.
1018 #
1019 # If unknown or unspecified, the service will attempt to choose a reasonable
1020 # default.
1021 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
1022 # the service will use the network &quot;default&quot;.
1023 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
1024 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1025 # attempt to choose a reasonable default.
1026 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1027 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
1028 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
1029 },
1030 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
1031 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1032 },
1033 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1034 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
1035 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
1036 # execute the job. If zero or unspecified, the service will
1037 # attempt to choose a reasonable default.
1038 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
1039 # service will choose a number of threads (according to the number of cores
1040 # on the selected machine type for batch, or 1 by convention for streaming).
1041 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
1042 # harness, residing in Google Container Registry.
1043 #
1044 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1045 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1046 # using the standard Dataflow task runner. Users should ignore
1047 # this field.
1048 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
1049 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
1050 # access the Cloud Dataflow API.
1051 &quot;A String&quot;,
1052 ],
1053 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001054 #
1055 # When workers access Google Cloud APIs, they logically do so via
1056 # relative URLs. If this field is specified, it supplies the base
1057 # URL to use for resolving these relative URLs. The normative
1058 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1059 # Locators&quot;.
1060 #
1061 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001062 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
1063 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1064 # console.
1065 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
1066 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
1067 # taskrunner; e.g. &quot;root&quot;.
1068 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
1069 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
1070 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1071 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
1072 # &quot;shuffle/v1beta1&quot;.
1073 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1074 # storage.
1075 #
1076 # The supported resource type is:
1077 #
1078 # Google Cloud Storage:
1079 #
1080 # storage.googleapis.com/{bucket}/{object}
1081 # bucket.storage.googleapis.com/{object}
1082 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
1083 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
1084 # &quot;dataflow/v1b3/projects&quot;.
1085 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
1086 #
1087 # When workers access Google Cloud APIs, they logically do so via
1088 # relative URLs. If this field is specified, it supplies the base
1089 # URL to use for resolving these relative URLs. The normative
1090 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1091 # Locators&quot;.
1092 #
1093 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
1094 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
1095 },
1096 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
1097 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
1098 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
1099 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
1100 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
1101 # taskrunner; e.g. &quot;wheel&quot;.
1102 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
1103 # will not be uploaded.
1104 #
1105 # The supported resource type is:
1106 #
1107 # Google Cloud Storage:
1108 # storage.googleapis.com/{bucket}/{object}
1109 # bucket.storage.googleapis.com/{object}
1110 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
1111 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
1112 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
1113 # temporary storage.
1114 #
1115 # The supported resource type is:
1116 #
1117 # Google Cloud Storage:
1118 # storage.googleapis.com/{bucket}/{object}
1119 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001120 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001121 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
1122 # attempt to choose a reasonable default.
1123 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
1124 # select a default set of packages which are useful to worker
1125 # harnesses written in a particular language.
1126 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
1127 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001128 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001129 ],
1130 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1131 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
1132 # this resource prefix, where {JOBNAME} is the value of the
1133 # job_name field. The resulting bucket and object prefix is used
1134 # as the prefix of the resources used to store temporary data
1135 # needed during the job execution. NOTE: This will override the
1136 # value in taskrunner_settings.
1137 # The supported resource type is:
1138 #
1139 # Google Cloud Storage:
1140 #
1141 # storage.googleapis.com/{bucket}/{object}
1142 # bucket.storage.googleapis.com/{object}
1143 &quot;internalExperiments&quot;: { # Experimental settings.
1144 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001145 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001146 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
1147 # options are passed through the service and are used to recreate the
1148 # SDK pipeline options on the worker in a language agnostic and platform
1149 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07001150 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001151 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001152 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
1153 # related tables are stored.
1154 #
1155 # The supported resource type is:
1156 #
1157 # Google BigQuery:
1158 # bigquery.googleapis.com/{dataset}
1159 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
1160 # unspecified, the service will attempt to choose a reasonable
1161 # default. This should be in the form of the API service name,
1162 # e.g. &quot;compute.googleapis.com&quot;.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001163 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001164 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
1165 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
1166 #
1167 # The top-level steps that constitute the entire job.
1168 { # Defines a particular step within a Cloud Dataflow job.
1169 #
1170 # A job consists of multiple steps, each of which performs some
1171 # specific operation as part of the overall job. Data is typically
1172 # passed from one step to another as part of the job.
1173 #
1174 # Here&#x27;s an example of a sequence of steps which together implement a
1175 # Map-Reduce job:
1176 #
1177 # * Read a collection of data from some source, parsing the
1178 # collection&#x27;s elements.
1179 #
1180 # * Validate the elements.
1181 #
1182 # * Apply a user-defined function to map each element to some value
1183 # and extract an element-specific key value.
1184 #
1185 # * Group elements with the same key into a single element with
1186 # that key, transforming a multiply-keyed collection into a
1187 # uniquely-keyed collection.
1188 #
1189 # * Write the elements out to some data sink.
1190 #
1191 # Note that the Cloud Dataflow service may be used to run many different
1192 # types of jobs, not just Map-Reduce.
1193 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
1194 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
1195 # predefined step has its own required set of properties.
1196 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1197 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1198 },
1199 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
1200 # step with respect to all other steps in the Cloud Dataflow job.
1201 },
1202 ],
1203 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
1204 # callers cannot mutate it.
1205 { # A message describing the state of a particular execution stage.
1206 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
1207 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
1208 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
1209 },
1210 ],
1211 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
1212 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1213 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1214 # by the metadata values provided here. Populated for ListJobs and all GetJob
1215 # views SUMMARY and higher.
1216 # ListJob response and Job SUMMARY view.
1217 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
1218 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
1219 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
1220 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
1221 },
1222 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
1223 { # Metadata for a BigTable connector used by the job.
1224 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1225 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
1226 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1227 },
1228 ],
1229 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
1230 { # Metadata for a PubSub connector used by the job.
1231 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
1232 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
1233 },
1234 ],
1235 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
1236 { # Metadata for a BigQuery connector used by the job.
1237 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
1238 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
1239 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
1240 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
1241 },
1242 ],
1243 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
1244 { # Metadata for a File connector used by the job.
1245 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
1246 },
1247 ],
1248 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
1249 { # Metadata for a Datastore connector used by the job.
1250 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
1251 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1252 },
1253 ],
1254 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
1255 { # Metadata for a Spanner connector used by the job.
1256 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1257 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
1258 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1259 },
1260 ],
1261 },
1262 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
1263 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1264 # contains this job.
1265 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
1266 # corresponding name prefixes of the new job.
1267 &quot;a_key&quot;: &quot;A String&quot;,
1268 },
1269 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1270 # Flexible resource scheduling jobs are started with some delay after job
1271 # creation, so start_time is unset before start and is updated when the
1272 # job is started by the Cloud Dataflow service. For other jobs, start_time
1273 # always equals to create_time and is immutable and set by the Cloud Dataflow
1274 # service.
1275 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
1276 # If this field is set, the service will ensure its uniqueness.
1277 # The request to create a job will fail if the service has knowledge of a
1278 # previously submitted job with the same client&#x27;s ID and job name.
1279 # The caller may use this field to ensure idempotence of job
1280 # creation across retried attempts to create a job.
1281 # By default, the field is empty and, in that case, the service ignores it.
1282 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1283 # isn&#x27;t contained in the submitted job.
1284 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
1285 &quot;a_key&quot;: { # Contains information about how a particular
1286 # google.dataflow.v1beta3.Step will be executed.
1287 &quot;stepName&quot;: [ # The steps associated with the execution stage.
1288 # Note that stages may have several steps, and that a given step
1289 # might be run by more than one stage.
1290 &quot;A String&quot;,
1291 ],
1292 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001293 },
1294 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001295 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
1296 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
1297 # Cloud Dataflow service.
1298 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
1299 # for temporary storage. These temporary files will be
1300 # removed on job completion.
1301 # No duplicates are allowed.
1302 # No file patterns are supported.
1303 #
1304 # The supported files are:
1305 #
1306 # Google Cloud Storage:
1307 #
1308 # storage.googleapis.com/{bucket}/{object}
1309 # bucket.storage.googleapis.com/{object}
1310 &quot;A String&quot;,
1311 ],
1312 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
1313 #
1314 # This field is set by the Cloud Dataflow service when the Job is
1315 # created, and is immutable for the life of the job.
1316 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
1317 #
1318 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1319 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1320 # also be used to directly set a job&#x27;s requested state to
1321 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1322 # job if it has not already reached a terminal state.
1323 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
1324 # of the job it replaced.
1325 #
1326 # When sending a `CreateJobRequest`, you can update a job by specifying it
1327 # here. The job named here is stopped, and its intermediate state is
1328 # transferred to this job.
1329 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
1330 # snapshot.
1331 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
1332 #
1333 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1334 # specified.
1335 #
1336 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1337 # terminal state. After a job has reached a terminal state, no
1338 # further state updates may be made.
1339 #
1340 # This field may be mutated by the Cloud Dataflow service;
1341 # callers cannot mutate it.
1342 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
1343 #
1344 # Only one Job with a given name may exist in a project at any
1345 # given time. If a caller attempts to create a Job with the same
1346 # name as an already-existing Job, the attempt returns the
1347 # existing Job.
1348 #
1349 # The name must match the regular expression
1350 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1351 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
1352 }
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001353
1354 location: string, The [regional endpoint]
1355(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1356contains this job.
Bu Sun Kim65020912020-05-20 12:08:20 -07001357 replaceJobId: string, Deprecated. This field is now in the Job message.
1358 view: string, The level of information requested in response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001359 x__xgafv: string, V1 error format.
1360 Allowed values
1361 1 - v1 error format
1362 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001363
1364Returns:
1365 An object of the form:
1366
1367 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001368 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1369 # A description of the user pipeline and stages through which it is executed.
1370 # Created by Cloud Dataflow service. Only retrieved with
1371 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1372 # form. This data is provided by the Dataflow service for ease of visualizing
1373 # the pipeline and interpreting Dataflow provided metrics.
1374 &quot;displayData&quot;: [ # Pipeline level display data.
1375 { # Data provided with a pipeline or transform to provide descriptive info.
1376 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1377 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1378 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1379 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1380 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1381 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1382 # This is intended to be used as a label for the display data
1383 # when viewed in a dax monitoring system.
1384 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1385 # language namespace (i.e. python module) which defines the display data.
1386 # This allows a dax monitoring system to specially handle the data
1387 # and perform custom rendering.
1388 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1389 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1390 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1391 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1392 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1393 # For example a java_class_name_value of com.mypackage.MyDoFn
1394 # will be stored with MyDoFn as the short_str_value and
1395 # com.mypackage.MyDoFn as the java_class_name value.
1396 # short_str_value can be displayed and java_class_name_value
1397 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001398 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001399 ],
1400 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
1401 { # Description of the type, names/ids, and input/outputs for a transform.
1402 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001403 &quot;A String&quot;,
1404 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001405 &quot;displayData&quot;: [ # Transform-specific display data.
1406 { # Data provided with a pipeline or transform to provide descriptive info.
1407 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1408 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1409 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1410 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1411 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1412 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1413 # This is intended to be used as a label for the display data
1414 # when viewed in a dax monitoring system.
1415 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1416 # language namespace (i.e. python module) which defines the display data.
1417 # This allows a dax monitoring system to specially handle the data
1418 # and perform custom rendering.
1419 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1420 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1421 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1422 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1423 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1424 # For example a java_class_name_value of com.mypackage.MyDoFn
1425 # will be stored with MyDoFn as the short_str_value and
1426 # com.mypackage.MyDoFn as the java_class_name value.
1427 # short_str_value can be displayed and java_class_name_value
1428 # will be displayed as a tooltip.
1429 },
1430 ],
1431 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
1432 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
1433 &quot;A String&quot;,
1434 ],
1435 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
1436 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
1437 },
1438 ],
1439 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
1440 { # Description of the composing transforms, names/ids, and input/outputs of a
1441 # stage of execution. Some composing transforms and sources may have been
1442 # generated by the Dataflow service during execution planning.
1443 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
1444 { # Description of an interstitial value between transforms in an execution
1445 # stage.
1446 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1447 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1448 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1449 # source is most closely associated.
1450 },
1451 ],
1452 &quot;inputSource&quot;: [ # Input sources for this stage.
1453 { # Description of an input or output of an execution stage.
1454 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1455 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1456 # source is most closely associated.
1457 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1458 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1459 },
1460 ],
1461 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
1462 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
1463 { # Description of a transform executed as part of an execution stage.
1464 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1465 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1466 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
1467 # most closely associated.
1468 },
1469 ],
1470 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
1471 &quot;outputSource&quot;: [ # Output sources for this stage.
1472 { # Description of an input or output of an execution stage.
1473 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1474 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1475 # source is most closely associated.
1476 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1477 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1478 },
1479 ],
1480 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
1481 },
1482 ],
1483 },
1484 &quot;labels&quot;: { # User-defined labels for this job.
1485 #
1486 # The labels map can contain no more than 64 entries. Entries of the labels
1487 # map are UTF8 strings that comply with the following restrictions:
1488 #
1489 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1490 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1491 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
1492 # size.
1493 &quot;a_key&quot;: &quot;A String&quot;,
1494 },
1495 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
1496 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
1497 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
1498 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
1499 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1500 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
1501 # with worker_zone. If neither worker_region nor worker_zone is specified,
1502 # default to the control plane&#x27;s region.
1503 &quot;userAgent&quot;: { # A description of the process that generated the request.
1504 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1505 },
1506 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
1507 &quot;version&quot;: { # A structure describing which components and their versions of the service
1508 # are required in order to run the job.
1509 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1510 },
1511 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
1512 # at rest, AKA a Customer Managed Encryption Key (CMEK).
1513 #
1514 # Format:
1515 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
1516 &quot;experiments&quot;: [ # The list of experiments to enable.
1517 &quot;A String&quot;,
1518 ],
1519 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
1520 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1521 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
1522 # with worker_region. If neither worker_region nor worker_zone is specified,
1523 # a zone in the control plane&#x27;s region is chosen based on available capacity.
1524 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
1525 # specified in order for the job to have workers.
1526 { # Describes one particular pool of Cloud Dataflow workers to be
1527 # instantiated by the Cloud Dataflow service in order to perform the
1528 # computations required by a job. Note that a workflow job may use
1529 # multiple pools, in order to match the various computational
1530 # requirements of the various stages of the job.
1531 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
1532 # Compute Engine API.
1533 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
1534 # only be set in the Fn API path. For non-cross-language pipelines this
1535 # should have only one entry. Cross-language pipelines will have two or more
1536 # entries.
1537 { # Defines a SDK harness container for executing Dataflow pipelines.
1538 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
1539 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
1540 # container instance with this image. If false (or unset) recommends using
1541 # more than one core per SDK container instance with this image for
1542 # efficiency. Note that Dataflow service may choose to override this property
1543 # if needed.
1544 },
1545 ],
1546 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
1547 # will attempt to choose a reasonable default.
1548 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
1549 # are supported.
1550 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
1551 &quot;a_key&quot;: &quot;A String&quot;,
1552 },
1553 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
1554 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
1555 { # Describes the data disk used by a workflow job.
1556 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
1557 # attempt to choose a reasonable default.
1558 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
1559 # must be a disk type appropriate to the project and zone in which
1560 # the workers will run. If unknown or unspecified, the service
1561 # will attempt to choose a reasonable default.
1562 #
1563 # For example, the standard persistent disk type is a resource name
1564 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
1565 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
1566 # actual valid values are defined the Google Compute Engine API,
1567 # not by the Cloud Dataflow API; consult the Google Compute Engine
1568 # documentation for more information about determining the set of
1569 # available disk types for a particular project and zone.
1570 #
1571 # Google Compute Engine Disk types are local to a particular
1572 # project in a particular zone, and so the resource name will
1573 # typically look something like this:
1574 #
1575 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
1576 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
1577 },
1578 ],
1579 &quot;packages&quot;: [ # Packages to be installed on workers.
1580 { # The packages that must be installed in order for a worker to run the
1581 # steps of the Cloud Dataflow job that will be assigned to its worker
1582 # pool.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001583 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001584 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1585 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1586 # might use this to install jars containing the user&#x27;s code and all of the
1587 # various dependencies (libraries, data files, etc.) required in order
1588 # for that code to run.
1589 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
1590 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
1591 #
1592 # Google Cloud Storage:
1593 #
1594 # storage.googleapis.com/{bucket}
1595 # bucket.storage.googleapis.com/
1596 },
1597 ],
1598 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
1599 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1600 # `TEARDOWN_NEVER`.
1601 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1602 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1603 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1604 # down.
1605 #
1606 # If the workers are not torn down by the service, they will
1607 # continue to run and use Google Compute Engine VM resources in the
1608 # user&#x27;s project until they are explicitly terminated by the user.
1609 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1610 # policy except for small, manually supervised test jobs.
1611 #
1612 # If unknown or unspecified, the service will attempt to choose a reasonable
1613 # default.
1614 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
1615 # the service will use the network &quot;default&quot;.
1616 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
1617 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1618 # attempt to choose a reasonable default.
1619 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1620 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
1621 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
1622 },
1623 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
1624 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1625 },
1626 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1627 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
1628 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
1629 # execute the job. If zero or unspecified, the service will
1630 # attempt to choose a reasonable default.
1631 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
1632 # service will choose a number of threads (according to the number of cores
1633 # on the selected machine type for batch, or 1 by convention for streaming).
1634 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
1635 # harness, residing in Google Container Registry.
1636 #
1637 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1638 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1639 # using the standard Dataflow task runner. Users should ignore
1640 # this field.
1641 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
1642 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
1643 # access the Cloud Dataflow API.
1644 &quot;A String&quot;,
1645 ],
1646 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001647 #
1648 # When workers access Google Cloud APIs, they logically do so via
1649 # relative URLs. If this field is specified, it supplies the base
1650 # URL to use for resolving these relative URLs. The normative
1651 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1652 # Locators&quot;.
1653 #
1654 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001655 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
1656 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1657 # console.
1658 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
1659 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
1660 # taskrunner; e.g. &quot;root&quot;.
1661 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
1662 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
1663 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1664 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
1665 # &quot;shuffle/v1beta1&quot;.
1666 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1667 # storage.
1668 #
1669 # The supported resource type is:
1670 #
1671 # Google Cloud Storage:
1672 #
1673 # storage.googleapis.com/{bucket}/{object}
1674 # bucket.storage.googleapis.com/{object}
1675 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
1676 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
1677 # &quot;dataflow/v1b3/projects&quot;.
1678 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
1679 #
1680 # When workers access Google Cloud APIs, they logically do so via
1681 # relative URLs. If this field is specified, it supplies the base
1682 # URL to use for resolving these relative URLs. The normative
1683 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1684 # Locators&quot;.
1685 #
1686 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
1687 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
1688 },
1689 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
1690 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
1691 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
1692 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
1693 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
1694 # taskrunner; e.g. &quot;wheel&quot;.
1695 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
1696 # will not be uploaded.
1697 #
1698 # The supported resource type is:
1699 #
1700 # Google Cloud Storage:
1701 # storage.googleapis.com/{bucket}/{object}
1702 # bucket.storage.googleapis.com/{object}
1703 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
1704 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
1705 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
1706 # temporary storage.
1707 #
1708 # The supported resource type is:
1709 #
1710 # Google Cloud Storage:
1711 # storage.googleapis.com/{bucket}/{object}
1712 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001713 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001714 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
1715 # attempt to choose a reasonable default.
1716 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
1717 # select a default set of packages which are useful to worker
1718 # harnesses written in a particular language.
1719 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
1720 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001721 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001722 ],
1723 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1724 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
1725 # this resource prefix, where {JOBNAME} is the value of the
1726 # job_name field. The resulting bucket and object prefix is used
1727 # as the prefix of the resources used to store temporary data
1728 # needed during the job execution. NOTE: This will override the
1729 # value in taskrunner_settings.
1730 # The supported resource type is:
1731 #
1732 # Google Cloud Storage:
1733 #
1734 # storage.googleapis.com/{bucket}/{object}
1735 # bucket.storage.googleapis.com/{object}
1736 &quot;internalExperiments&quot;: { # Experimental settings.
1737 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07001738 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001739 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
1740 # options are passed through the service and are used to recreate the
1741 # SDK pipeline options on the worker in a language agnostic and platform
1742 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07001743 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001744 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001745 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
1746 # related tables are stored.
1747 #
1748 # The supported resource type is:
1749 #
1750 # Google BigQuery:
1751 # bigquery.googleapis.com/{dataset}
1752 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
1753 # unspecified, the service will attempt to choose a reasonable
1754 # default. This should be in the form of the API service name,
1755 # e.g. &quot;compute.googleapis.com&quot;.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001756 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001757 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
1758 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
1759 #
1760 # The top-level steps that constitute the entire job.
1761 { # Defines a particular step within a Cloud Dataflow job.
1762 #
1763 # A job consists of multiple steps, each of which performs some
1764 # specific operation as part of the overall job. Data is typically
1765 # passed from one step to another as part of the job.
1766 #
1767 # Here&#x27;s an example of a sequence of steps which together implement a
1768 # Map-Reduce job:
1769 #
1770 # * Read a collection of data from some source, parsing the
1771 # collection&#x27;s elements.
1772 #
1773 # * Validate the elements.
1774 #
1775 # * Apply a user-defined function to map each element to some value
1776 # and extract an element-specific key value.
1777 #
1778 # * Group elements with the same key into a single element with
1779 # that key, transforming a multiply-keyed collection into a
1780 # uniquely-keyed collection.
1781 #
1782 # * Write the elements out to some data sink.
1783 #
1784 # Note that the Cloud Dataflow service may be used to run many different
1785 # types of jobs, not just Map-Reduce.
1786 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
1787 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
1788 # predefined step has its own required set of properties.
1789 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1790 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1791 },
1792 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
1793 # step with respect to all other steps in the Cloud Dataflow job.
1794 },
1795 ],
1796 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
1797 # callers cannot mutate it.
1798 { # A message describing the state of a particular execution stage.
1799 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
1800 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
1801 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
1802 },
1803 ],
1804 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
1805 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1806 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1807 # by the metadata values provided here. Populated for ListJobs and all GetJob
1808 # views SUMMARY and higher.
1809 # ListJob response and Job SUMMARY view.
1810 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
1811 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
1812 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
1813 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
1814 },
1815 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
1816 { # Metadata for a BigTable connector used by the job.
1817 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1818 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
1819 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1820 },
1821 ],
1822 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
1823 { # Metadata for a PubSub connector used by the job.
1824 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
1825 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
1826 },
1827 ],
1828 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
1829 { # Metadata for a BigQuery connector used by the job.
1830 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
1831 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
1832 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
1833 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
1834 },
1835 ],
1836 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
1837 { # Metadata for a File connector used by the job.
1838 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
1839 },
1840 ],
1841 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
1842 { # Metadata for a Datastore connector used by the job.
1843 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
1844 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1845 },
1846 ],
1847 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
1848 { # Metadata for a Spanner connector used by the job.
1849 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1850 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
1851 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1852 },
1853 ],
1854 },
1855 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
1856 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1857 # contains this job.
1858 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
1859 # corresponding name prefixes of the new job.
1860 &quot;a_key&quot;: &quot;A String&quot;,
1861 },
1862 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1863 # Flexible resource scheduling jobs are started with some delay after job
1864 # creation, so start_time is unset before start and is updated when the
1865 # job is started by the Cloud Dataflow service. For other jobs, start_time
1866 # always equals to create_time and is immutable and set by the Cloud Dataflow
1867 # service.
1868 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
1869 # If this field is set, the service will ensure its uniqueness.
1870 # The request to create a job will fail if the service has knowledge of a
1871 # previously submitted job with the same client&#x27;s ID and job name.
1872 # The caller may use this field to ensure idempotence of job
1873 # creation across retried attempts to create a job.
1874 # By default, the field is empty and, in that case, the service ignores it.
1875 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1876 # isn&#x27;t contained in the submitted job.
1877 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
1878 &quot;a_key&quot;: { # Contains information about how a particular
1879 # google.dataflow.v1beta3.Step will be executed.
1880 &quot;stepName&quot;: [ # The steps associated with the execution stage.
1881 # Note that stages may have several steps, and that a given step
1882 # might be run by more than one stage.
1883 &quot;A String&quot;,
1884 ],
1885 },
Bu Sun Kim65020912020-05-20 12:08:20 -07001886 },
1887 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001888 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
1889 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
1890 # Cloud Dataflow service.
1891 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
1892 # for temporary storage. These temporary files will be
1893 # removed on job completion.
1894 # No duplicates are allowed.
1895 # No file patterns are supported.
1896 #
1897 # The supported files are:
1898 #
1899 # Google Cloud Storage:
1900 #
1901 # storage.googleapis.com/{bucket}/{object}
1902 # bucket.storage.googleapis.com/{object}
1903 &quot;A String&quot;,
1904 ],
1905 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
1906 #
1907 # This field is set by the Cloud Dataflow service when the Job is
1908 # created, and is immutable for the life of the job.
1909 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
1910 #
1911 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1912 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1913 # also be used to directly set a job&#x27;s requested state to
1914 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1915 # job if it has not already reached a terminal state.
1916 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
1917 # of the job it replaced.
1918 #
1919 # When sending a `CreateJobRequest`, you can update a job by specifying it
1920 # here. The job named here is stopped, and its intermediate state is
1921 # transferred to this job.
1922 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
1923 # snapshot.
1924 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
1925 #
1926 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1927 # specified.
1928 #
1929 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1930 # terminal state. After a job has reached a terminal state, no
1931 # further state updates may be made.
1932 #
1933 # This field may be mutated by the Cloud Dataflow service;
1934 # callers cannot mutate it.
1935 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
1936 #
1937 # Only one Job with a given name may exist in a project at any
1938 # given time. If a caller attempts to create a Job with the same
1939 # name as an already-existing Job, the attempt returns the
1940 # existing Job.
1941 #
1942 # The name must match the regular expression
1943 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1944 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
1945 }</pre>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001946</div>
1947
1948<div class="method">
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001949 <code class="details" id="get">get(projectId, jobId, location=None, view=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001950 <pre>Gets the state of the specified Cloud Dataflow job.
1951
1952To get the state of a job, we recommend using `projects.locations.jobs.get`
1953with a [regional endpoint]
1954(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1955`projects.jobs.get` is not recommended, as you can only get the state of
1956jobs that are running in `us-central1`.
1957
1958Args:
1959 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
1960 jobId: string, The job ID. (required)
1961 location: string, The [regional endpoint]
1962(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1963contains this job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001964 view: string, The level of information requested in response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001965 x__xgafv: string, V1 error format.
1966 Allowed values
1967 1 - v1 error format
1968 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001969
1970Returns:
1971 An object of the form:
1972
1973 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07001974 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1975 # A description of the user pipeline and stages through which it is executed.
1976 # Created by Cloud Dataflow service. Only retrieved with
1977 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1978 # form. This data is provided by the Dataflow service for ease of visualizing
1979 # the pipeline and interpreting Dataflow provided metrics.
1980 &quot;displayData&quot;: [ # Pipeline level display data.
1981 { # Data provided with a pipeline or transform to provide descriptive info.
1982 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
1983 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
1984 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
1985 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
1986 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
1987 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
1988 # This is intended to be used as a label for the display data
1989 # when viewed in a dax monitoring system.
1990 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
1991 # language namespace (i.e. python module) which defines the display data.
1992 # This allows a dax monitoring system to specially handle the data
1993 # and perform custom rendering.
1994 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
1995 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
1996 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
1997 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
1998 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
1999 # For example a java_class_name_value of com.mypackage.MyDoFn
2000 # will be stored with MyDoFn as the short_str_value and
2001 # com.mypackage.MyDoFn as the java_class_name value.
2002 # short_str_value can be displayed and java_class_name_value
2003 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002004 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002005 ],
2006 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
2007 { # Description of the type, names/ids, and input/outputs for a transform.
2008 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002009 &quot;A String&quot;,
2010 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002011 &quot;displayData&quot;: [ # Transform-specific display data.
2012 { # Data provided with a pipeline or transform to provide descriptive info.
2013 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
2014 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
2015 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
2016 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
2017 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
2018 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
2019 # This is intended to be used as a label for the display data
2020 # when viewed in a dax monitoring system.
2021 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
2022 # language namespace (i.e. python module) which defines the display data.
2023 # This allows a dax monitoring system to specially handle the data
2024 # and perform custom rendering.
2025 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
2026 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
2027 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
2028 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
2029 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
2030 # For example a java_class_name_value of com.mypackage.MyDoFn
2031 # will be stored with MyDoFn as the short_str_value and
2032 # com.mypackage.MyDoFn as the java_class_name value.
2033 # short_str_value can be displayed and java_class_name_value
2034 # will be displayed as a tooltip.
2035 },
2036 ],
2037 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
2038 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
2039 &quot;A String&quot;,
2040 ],
2041 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
2042 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
2043 },
2044 ],
2045 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
2046 { # Description of the composing transforms, names/ids, and input/outputs of a
2047 # stage of execution. Some composing transforms and sources may have been
2048 # generated by the Dataflow service during execution planning.
2049 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
2050 { # Description of an interstitial value between transforms in an execution
2051 # stage.
2052 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
2053 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2054 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2055 # source is most closely associated.
2056 },
2057 ],
2058 &quot;inputSource&quot;: [ # Input sources for this stage.
2059 { # Description of an input or output of an execution stage.
2060 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
2061 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2062 # source is most closely associated.
2063 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
2064 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2065 },
2066 ],
2067 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
2068 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
2069 { # Description of a transform executed as part of an execution stage.
2070 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2071 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
2072 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
2073 # most closely associated.
2074 },
2075 ],
2076 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
2077 &quot;outputSource&quot;: [ # Output sources for this stage.
2078 { # Description of an input or output of an execution stage.
2079 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
2080 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2081 # source is most closely associated.
2082 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
2083 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2084 },
2085 ],
2086 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
2087 },
2088 ],
2089 },
2090 &quot;labels&quot;: { # User-defined labels for this job.
2091 #
2092 # The labels map can contain no more than 64 entries. Entries of the labels
2093 # map are UTF8 strings that comply with the following restrictions:
2094 #
2095 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2096 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
2097 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
2098 # size.
2099 &quot;a_key&quot;: &quot;A String&quot;,
2100 },
2101 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
2102 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2103 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
2104 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
2105 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2106 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
2107 # with worker_zone. If neither worker_region nor worker_zone is specified,
2108 # default to the control plane&#x27;s region.
2109 &quot;userAgent&quot;: { # A description of the process that generated the request.
2110 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2111 },
2112 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
2113 &quot;version&quot;: { # A structure describing which components and their versions of the service
2114 # are required in order to run the job.
2115 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2116 },
2117 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
2118 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2119 #
2120 # Format:
2121 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2122 &quot;experiments&quot;: [ # The list of experiments to enable.
2123 &quot;A String&quot;,
2124 ],
2125 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
2126 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2127 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
2128 # with worker_region. If neither worker_region nor worker_zone is specified,
2129 # a zone in the control plane&#x27;s region is chosen based on available capacity.
2130 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
2131 # specified in order for the job to have workers.
2132 { # Describes one particular pool of Cloud Dataflow workers to be
2133 # instantiated by the Cloud Dataflow service in order to perform the
2134 # computations required by a job. Note that a workflow job may use
2135 # multiple pools, in order to match the various computational
2136 # requirements of the various stages of the job.
2137 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
2138 # Compute Engine API.
2139 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
2140 # only be set in the Fn API path. For non-cross-language pipelines this
2141 # should have only one entry. Cross-language pipelines will have two or more
2142 # entries.
2143 { # Defines a SDK harness container for executing Dataflow pipelines.
2144 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
2145 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
2146 # container instance with this image. If false (or unset) recommends using
2147 # more than one core per SDK container instance with this image for
2148 # efficiency. Note that Dataflow service may choose to override this property
2149 # if needed.
2150 },
2151 ],
2152 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
2153 # will attempt to choose a reasonable default.
2154 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
2155 # are supported.
2156 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
2157 &quot;a_key&quot;: &quot;A String&quot;,
2158 },
2159 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
2160 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
2161 { # Describes the data disk used by a workflow job.
2162 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
2163 # attempt to choose a reasonable default.
2164 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
2165 # must be a disk type appropriate to the project and zone in which
2166 # the workers will run. If unknown or unspecified, the service
2167 # will attempt to choose a reasonable default.
2168 #
2169 # For example, the standard persistent disk type is a resource name
2170 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
2171 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
2172 # actual valid values are defined the Google Compute Engine API,
2173 # not by the Cloud Dataflow API; consult the Google Compute Engine
2174 # documentation for more information about determining the set of
2175 # available disk types for a particular project and zone.
2176 #
2177 # Google Compute Engine Disk types are local to a particular
2178 # project in a particular zone, and so the resource name will
2179 # typically look something like this:
2180 #
2181 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
2182 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
2183 },
2184 ],
2185 &quot;packages&quot;: [ # Packages to be installed on workers.
2186 { # The packages that must be installed in order for a worker to run the
2187 # steps of the Cloud Dataflow job that will be assigned to its worker
2188 # pool.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002189 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002190 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2191 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2192 # might use this to install jars containing the user&#x27;s code and all of the
2193 # various dependencies (libraries, data files, etc.) required in order
2194 # for that code to run.
2195 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
2196 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
2197 #
2198 # Google Cloud Storage:
2199 #
2200 # storage.googleapis.com/{bucket}
2201 # bucket.storage.googleapis.com/
2202 },
2203 ],
2204 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
2205 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2206 # `TEARDOWN_NEVER`.
2207 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2208 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2209 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2210 # down.
2211 #
2212 # If the workers are not torn down by the service, they will
2213 # continue to run and use Google Compute Engine VM resources in the
2214 # user&#x27;s project until they are explicitly terminated by the user.
2215 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2216 # policy except for small, manually supervised test jobs.
2217 #
2218 # If unknown or unspecified, the service will attempt to choose a reasonable
2219 # default.
2220 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
2221 # the service will use the network &quot;default&quot;.
2222 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
2223 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2224 # attempt to choose a reasonable default.
2225 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2226 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
2227 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
2228 },
2229 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
2230 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
2231 },
2232 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2233 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
2234 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
2235 # execute the job. If zero or unspecified, the service will
2236 # attempt to choose a reasonable default.
2237 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
2238 # service will choose a number of threads (according to the number of cores
2239 # on the selected machine type for batch, or 1 by convention for streaming).
2240 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
2241 # harness, residing in Google Container Registry.
2242 #
2243 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2244 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2245 # using the standard Dataflow task runner. Users should ignore
2246 # this field.
2247 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
2248 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
2249 # access the Cloud Dataflow API.
2250 &quot;A String&quot;,
2251 ],
2252 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002253 #
2254 # When workers access Google Cloud APIs, they logically do so via
2255 # relative URLs. If this field is specified, it supplies the base
2256 # URL to use for resolving these relative URLs. The normative
2257 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2258 # Locators&quot;.
2259 #
2260 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002261 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
2262 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2263 # console.
2264 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
2265 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
2266 # taskrunner; e.g. &quot;root&quot;.
2267 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
2268 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
2269 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2270 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
2271 # &quot;shuffle/v1beta1&quot;.
2272 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2273 # storage.
2274 #
2275 # The supported resource type is:
2276 #
2277 # Google Cloud Storage:
2278 #
2279 # storage.googleapis.com/{bucket}/{object}
2280 # bucket.storage.googleapis.com/{object}
2281 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
2282 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
2283 # &quot;dataflow/v1b3/projects&quot;.
2284 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
2285 #
2286 # When workers access Google Cloud APIs, they logically do so via
2287 # relative URLs. If this field is specified, it supplies the base
2288 # URL to use for resolving these relative URLs. The normative
2289 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2290 # Locators&quot;.
2291 #
2292 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
2293 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
2294 },
2295 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
2296 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
2297 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
2298 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
2299 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
2300 # taskrunner; e.g. &quot;wheel&quot;.
2301 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
2302 # will not be uploaded.
2303 #
2304 # The supported resource type is:
2305 #
2306 # Google Cloud Storage:
2307 # storage.googleapis.com/{bucket}/{object}
2308 # bucket.storage.googleapis.com/{object}
2309 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
2310 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
2311 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
2312 # temporary storage.
2313 #
2314 # The supported resource type is:
2315 #
2316 # Google Cloud Storage:
2317 # storage.googleapis.com/{bucket}/{object}
2318 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002319 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002320 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
2321 # attempt to choose a reasonable default.
2322 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
2323 # select a default set of packages which are useful to worker
2324 # harnesses written in a particular language.
2325 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
2326 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002327 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002328 ],
2329 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2330 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
2331 # this resource prefix, where {JOBNAME} is the value of the
2332 # job_name field. The resulting bucket and object prefix is used
2333 # as the prefix of the resources used to store temporary data
2334 # needed during the job execution. NOTE: This will override the
2335 # value in taskrunner_settings.
2336 # The supported resource type is:
2337 #
2338 # Google Cloud Storage:
2339 #
2340 # storage.googleapis.com/{bucket}/{object}
2341 # bucket.storage.googleapis.com/{object}
2342 &quot;internalExperiments&quot;: { # Experimental settings.
2343 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002344 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002345 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
2346 # options are passed through the service and are used to recreate the
2347 # SDK pipeline options on the worker in a language agnostic and platform
2348 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07002349 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Takashi Matsuo06694102015-09-11 13:55:40 -07002350 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002351 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
2352 # related tables are stored.
2353 #
2354 # The supported resource type is:
2355 #
2356 # Google BigQuery:
2357 # bigquery.googleapis.com/{dataset}
2358 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
2359 # unspecified, the service will attempt to choose a reasonable
2360 # default. This should be in the form of the API service name,
2361 # e.g. &quot;compute.googleapis.com&quot;.
Takashi Matsuo06694102015-09-11 13:55:40 -07002362 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002363 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
2364 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
2365 #
2366 # The top-level steps that constitute the entire job.
2367 { # Defines a particular step within a Cloud Dataflow job.
2368 #
2369 # A job consists of multiple steps, each of which performs some
2370 # specific operation as part of the overall job. Data is typically
2371 # passed from one step to another as part of the job.
2372 #
2373 # Here&#x27;s an example of a sequence of steps which together implement a
2374 # Map-Reduce job:
2375 #
2376 # * Read a collection of data from some source, parsing the
2377 # collection&#x27;s elements.
2378 #
2379 # * Validate the elements.
2380 #
2381 # * Apply a user-defined function to map each element to some value
2382 # and extract an element-specific key value.
2383 #
2384 # * Group elements with the same key into a single element with
2385 # that key, transforming a multiply-keyed collection into a
2386 # uniquely-keyed collection.
2387 #
2388 # * Write the elements out to some data sink.
2389 #
2390 # Note that the Cloud Dataflow service may be used to run many different
2391 # types of jobs, not just Map-Reduce.
2392 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
2393 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
2394 # predefined step has its own required set of properties.
2395 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
2396 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2397 },
2398 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
2399 # step with respect to all other steps in the Cloud Dataflow job.
2400 },
2401 ],
2402 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
2403 # callers cannot mutate it.
2404 { # A message describing the state of a particular execution stage.
2405 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
2406 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
2407 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
2408 },
2409 ],
2410 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
2411 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2412 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2413 # by the metadata values provided here. Populated for ListJobs and all GetJob
2414 # views SUMMARY and higher.
2415 # ListJob response and Job SUMMARY view.
2416 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
2417 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
2418 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
2419 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
2420 },
2421 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
2422 { # Metadata for a BigTable connector used by the job.
2423 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
2424 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
2425 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2426 },
2427 ],
2428 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
2429 { # Metadata for a PubSub connector used by the job.
2430 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
2431 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
2432 },
2433 ],
2434 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
2435 { # Metadata for a BigQuery connector used by the job.
2436 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
2437 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
2438 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
2439 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
2440 },
2441 ],
2442 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
2443 { # Metadata for a File connector used by the job.
2444 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
2445 },
2446 ],
2447 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
2448 { # Metadata for a Datastore connector used by the job.
2449 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
2450 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2451 },
2452 ],
2453 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
2454 { # Metadata for a Spanner connector used by the job.
2455 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
2456 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
2457 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
2458 },
2459 ],
2460 },
2461 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
2462 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2463 # contains this job.
2464 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
2465 # corresponding name prefixes of the new job.
2466 &quot;a_key&quot;: &quot;A String&quot;,
2467 },
2468 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
2469 # Flexible resource scheduling jobs are started with some delay after job
2470 # creation, so start_time is unset before start and is updated when the
2471 # job is started by the Cloud Dataflow service. For other jobs, start_time
2472 # always equals to create_time and is immutable and set by the Cloud Dataflow
2473 # service.
2474 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
2475 # If this field is set, the service will ensure its uniqueness.
2476 # The request to create a job will fail if the service has knowledge of a
2477 # previously submitted job with the same client&#x27;s ID and job name.
2478 # The caller may use this field to ensure idempotence of job
2479 # creation across retried attempts to create a job.
2480 # By default, the field is empty and, in that case, the service ignores it.
2481 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
2482 # isn&#x27;t contained in the submitted job.
2483 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
2484 &quot;a_key&quot;: { # Contains information about how a particular
2485 # google.dataflow.v1beta3.Step will be executed.
2486 &quot;stepName&quot;: [ # The steps associated with the execution stage.
2487 # Note that stages may have several steps, and that a given step
2488 # might be run by more than one stage.
2489 &quot;A String&quot;,
2490 ],
2491 },
Bu Sun Kim65020912020-05-20 12:08:20 -07002492 },
2493 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002494 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
2495 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
2496 # Cloud Dataflow service.
2497 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
2498 # for temporary storage. These temporary files will be
2499 # removed on job completion.
2500 # No duplicates are allowed.
2501 # No file patterns are supported.
2502 #
2503 # The supported files are:
2504 #
2505 # Google Cloud Storage:
2506 #
2507 # storage.googleapis.com/{bucket}/{object}
2508 # bucket.storage.googleapis.com/{object}
2509 &quot;A String&quot;,
2510 ],
2511 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
2512 #
2513 # This field is set by the Cloud Dataflow service when the Job is
2514 # created, and is immutable for the life of the job.
2515 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
2516 #
2517 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
2518 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
2519 # also be used to directly set a job&#x27;s requested state to
2520 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
2521 # job if it has not already reached a terminal state.
2522 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
2523 # of the job it replaced.
2524 #
2525 # When sending a `CreateJobRequest`, you can update a job by specifying it
2526 # here. The job named here is stopped, and its intermediate state is
2527 # transferred to this job.
2528 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
2529 # snapshot.
2530 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
2531 #
2532 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
2533 # specified.
2534 #
2535 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
2536 # terminal state. After a job has reached a terminal state, no
2537 # further state updates may be made.
2538 #
2539 # This field may be mutated by the Cloud Dataflow service;
2540 # callers cannot mutate it.
2541 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
2542 #
2543 # Only one Job with a given name may exist in a project at any
2544 # given time. If a caller attempts to create a Job with the same
2545 # name as an already-existing Job, the attempt returns the
2546 # existing Job.
2547 #
2548 # The name must match the regular expression
2549 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
2550 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
2551 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002552</div>
2553
2554<div class="method">
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002555 <code class="details" id="getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</code>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002556 <pre>Request the job status.
2557
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002558To request the status of a job, we recommend using
2559`projects.locations.jobs.getMetrics` with a [regional endpoint]
2560(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
2561`projects.jobs.getMetrics` is not recommended, as you can only request the
2562status of jobs that are running in `us-central1`.
2563
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002564Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002565 projectId: string, A project id. (required)
2566 jobId: string, The job to get messages for. (required)
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002567 startTime: string, Return only metric data that has changed since this time.
2568Default is to return all information about all metrics for the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002569 location: string, The [regional endpoint]
2570(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2571contains the job specified by job_id.
Takashi Matsuo06694102015-09-11 13:55:40 -07002572 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002573 Allowed values
2574 1 - v1 error format
2575 2 - v2 error format
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002576
2577Returns:
2578 An object of the form:
2579
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002580 { # JobMetrics contains a collection of metrics describing the detailed progress
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002581 # of a Dataflow job. Metrics correspond to user-defined and system-defined
2582 # metrics in the job.
2583 #
2584 # This resource captures only the most recent values of each metric;
2585 # time-series data can be queried for them (under the same metric names)
2586 # from Cloud Monitoring.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002587 &quot;metricTime&quot;: &quot;A String&quot;, # Timestamp as of which metric values are current.
Bu Sun Kim65020912020-05-20 12:08:20 -07002588 &quot;metrics&quot;: [ # All metrics for this job.
Takashi Matsuo06694102015-09-11 13:55:40 -07002589 { # Describes the state of a metric.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002590 &quot;distribution&quot;: &quot;&quot;, # A struct value describing properties of a distribution of numeric values.
Bu Sun Kim65020912020-05-20 12:08:20 -07002591 &quot;kind&quot;: &quot;A String&quot;, # Metric aggregation kind. The possible metric aggregation kinds are
2592 # &quot;Sum&quot;, &quot;Max&quot;, &quot;Min&quot;, &quot;Mean&quot;, &quot;Set&quot;, &quot;And&quot;, &quot;Or&quot;, and &quot;Distribution&quot;.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002593 # The specified aggregation kind is case-insensitive.
2594 #
2595 # If omitted, this is not an aggregated value but instead
2596 # a single metric sample value.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002597 &quot;gauge&quot;: &quot;&quot;, # A struct value describing properties of a Gauge.
2598 # Metrics of gauge type show the value of a metric across time, and is
2599 # aggregated based on the newest value.
Bu Sun Kim65020912020-05-20 12:08:20 -07002600 &quot;updateTime&quot;: &quot;A String&quot;, # Timestamp associated with the metric value. Optional when workers are
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002601 # reporting work progress; it will be filled in responses from the
2602 # metrics API.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002603 &quot;scalar&quot;: &quot;&quot;, # Worker-computed aggregate value for aggregation kinds &quot;Sum&quot;, &quot;Max&quot;, &quot;Min&quot;,
2604 # &quot;And&quot;, and &quot;Or&quot;. The possible value types are Long, Double, and Boolean.
2605 &quot;cumulative&quot;: True or False, # True if this metric is reported as the total cumulative aggregate
2606 # value accumulated since the worker started working on this WorkItem.
2607 # By default this is false, indicating that this metric is reported
2608 # as a delta that is not associated with any WorkItem.
Bu Sun Kim65020912020-05-20 12:08:20 -07002609 &quot;name&quot;: { # Identifies a metric, by describing the source which generated the # Name of the metric.
2610 # metric.
2611 &quot;context&quot;: { # Zero or more labeled fields which identify the part of the job this
2612 # metric is associated with, such as the name of a step or collection.
2613 #
2614 # For example, built-in counters associated with steps will have
2615 # context[&#x27;step&#x27;] = &lt;step-name&gt;. Counters associated with PCollections
2616 # in the SDK will have context[&#x27;pcollection&#x27;] = &lt;pcollection-name&gt;.
2617 &quot;a_key&quot;: &quot;A String&quot;,
2618 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002619 &quot;name&quot;: &quot;A String&quot;, # Worker-defined metric name.
2620 &quot;origin&quot;: &quot;A String&quot;, # Origin (namespace) of metric name. May be blank for user-define metrics;
2621 # will be &quot;dataflow&quot; for metrics defined by the Dataflow service or SDK.
Bu Sun Kim65020912020-05-20 12:08:20 -07002622 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002623 &quot;meanCount&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Mean&quot; aggregation kind.
2624 # This holds the count of the aggregated values and is used in combination
2625 # with mean_sum above to obtain the actual mean aggregate value.
2626 # The only possible value type is Long.
2627 &quot;meanSum&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Mean&quot; aggregation kind.
2628 # This holds the sum of the aggregated values and is used in combination
2629 # with mean_count below to obtain the actual mean aggregate value.
2630 # The only possible value types are Long and Double.
2631 &quot;set&quot;: &quot;&quot;, # Worker-computed aggregate value for the &quot;Set&quot; aggregation kind. The only
2632 # possible value type is a list of Values whose type can be Long, Double,
2633 # or String, according to the metric&#x27;s type. All Values in the list must
2634 # be of the same type.
2635 &quot;internal&quot;: &quot;&quot;, # Worker-computed aggregate value for internal use by the Dataflow
2636 # service.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002637 },
2638 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002639 }</pre>
2640</div>
2641
2642<div class="method">
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002643 <code class="details" id="list">list(projectId, filter=None, pageSize=None, location=None, view=None, pageToken=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002644 <pre>List the jobs of a project.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002645
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002646To list the jobs of a project in a region, we recommend using
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002647`projects.locations.jobs.list` with a [regional endpoint]
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002648(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
2649list the all jobs across all regions, use `projects.jobs.aggregated`. Using
2650`projects.jobs.list` is not recommended, as you can only get the list of
2651jobs that are running in `us-central1`.
2652
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002653Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002654 projectId: string, The project which owns the jobs. (required)
Bu Sun Kim65020912020-05-20 12:08:20 -07002655 filter: string, The kind of filter to use.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002656 pageSize: integer, If there are many jobs, limit response to at most this many.
2657The actual number of jobs returned will be the lesser of max_responses
2658and an unspecified server-defined limit.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002659 location: string, The [regional endpoint]
2660(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2661contains this job.
Bu Sun Kim65020912020-05-20 12:08:20 -07002662 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002663 pageToken: string, Set this to the &#x27;next_page_token&#x27; field of a previous response
2664to request additional results in a long list.
Takashi Matsuo06694102015-09-11 13:55:40 -07002665 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002666 Allowed values
2667 1 - v1 error format
2668 2 - v2 error format
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002669
2670Returns:
2671 An object of the form:
2672
Dan O'Mearadd494642020-05-01 07:42:23 -07002673 { # Response to a request to list Cloud Dataflow jobs in a project. This might
2674 # be a partial response, depending on the page size in the ListJobsRequest.
2675 # However, if the project does not have any jobs, an instance of
Bu Sun Kim65020912020-05-20 12:08:20 -07002676 # ListJobsResponse is not returned and the requests&#x27;s response
Dan O'Mearadd494642020-05-01 07:42:23 -07002677 # body is empty {}.
Bu Sun Kim65020912020-05-20 12:08:20 -07002678 &quot;jobs&quot;: [ # A subset of the requested job information.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002679 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002680 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2681 # A description of the user pipeline and stages through which it is executed.
2682 # Created by Cloud Dataflow service. Only retrieved with
2683 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2684 # form. This data is provided by the Dataflow service for ease of visualizing
2685 # the pipeline and interpreting Dataflow provided metrics.
2686 &quot;displayData&quot;: [ # Pipeline level display data.
2687 { # Data provided with a pipeline or transform to provide descriptive info.
2688 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
2689 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
2690 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
2691 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
2692 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
2693 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
2694 # This is intended to be used as a label for the display data
2695 # when viewed in a dax monitoring system.
2696 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
2697 # language namespace (i.e. python module) which defines the display data.
2698 # This allows a dax monitoring system to specially handle the data
2699 # and perform custom rendering.
2700 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
2701 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
2702 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
2703 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
2704 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
2705 # For example a java_class_name_value of com.mypackage.MyDoFn
2706 # will be stored with MyDoFn as the short_str_value and
2707 # com.mypackage.MyDoFn as the java_class_name value.
2708 # short_str_value can be displayed and java_class_name_value
2709 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002710 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002711 ],
2712 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
2713 { # Description of the type, names/ids, and input/outputs for a transform.
2714 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002715 &quot;A String&quot;,
2716 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002717 &quot;displayData&quot;: [ # Transform-specific display data.
2718 { # Data provided with a pipeline or transform to provide descriptive info.
2719 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
2720 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
2721 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
2722 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
2723 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
2724 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
2725 # This is intended to be used as a label for the display data
2726 # when viewed in a dax monitoring system.
2727 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
2728 # language namespace (i.e. python module) which defines the display data.
2729 # This allows a dax monitoring system to specially handle the data
2730 # and perform custom rendering.
2731 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
2732 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
2733 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
2734 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
2735 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
2736 # For example a java_class_name_value of com.mypackage.MyDoFn
2737 # will be stored with MyDoFn as the short_str_value and
2738 # com.mypackage.MyDoFn as the java_class_name value.
2739 # short_str_value can be displayed and java_class_name_value
2740 # will be displayed as a tooltip.
2741 },
2742 ],
2743 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
2744 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
2745 &quot;A String&quot;,
2746 ],
2747 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
2748 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
2749 },
2750 ],
2751 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
2752 { # Description of the composing transforms, names/ids, and input/outputs of a
2753 # stage of execution. Some composing transforms and sources may have been
2754 # generated by the Dataflow service during execution planning.
2755 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
2756 { # Description of an interstitial value between transforms in an execution
2757 # stage.
2758 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
2759 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2760 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2761 # source is most closely associated.
2762 },
2763 ],
2764 &quot;inputSource&quot;: [ # Input sources for this stage.
2765 { # Description of an input or output of an execution stage.
2766 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
2767 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2768 # source is most closely associated.
2769 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
2770 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2771 },
2772 ],
2773 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
2774 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
2775 { # Description of a transform executed as part of an execution stage.
2776 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2777 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
2778 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
2779 # most closely associated.
2780 },
2781 ],
2782 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
2783 &quot;outputSource&quot;: [ # Output sources for this stage.
2784 { # Description of an input or output of an execution stage.
2785 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
2786 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
2787 # source is most closely associated.
2788 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
2789 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
2790 },
2791 ],
2792 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
2793 },
2794 ],
2795 },
2796 &quot;labels&quot;: { # User-defined labels for this job.
2797 #
2798 # The labels map can contain no more than 64 entries. Entries of the labels
2799 # map are UTF8 strings that comply with the following restrictions:
2800 #
2801 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2802 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
2803 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
2804 # size.
2805 &quot;a_key&quot;: &quot;A String&quot;,
2806 },
2807 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
2808 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2809 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
2810 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
2811 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2812 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
2813 # with worker_zone. If neither worker_region nor worker_zone is specified,
2814 # default to the control plane&#x27;s region.
2815 &quot;userAgent&quot;: { # A description of the process that generated the request.
2816 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2817 },
2818 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
2819 &quot;version&quot;: { # A structure describing which components and their versions of the service
2820 # are required in order to run the job.
2821 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
2822 },
2823 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
2824 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2825 #
2826 # Format:
2827 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2828 &quot;experiments&quot;: [ # The list of experiments to enable.
2829 &quot;A String&quot;,
2830 ],
2831 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
2832 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2833 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
2834 # with worker_region. If neither worker_region nor worker_zone is specified,
2835 # a zone in the control plane&#x27;s region is chosen based on available capacity.
2836 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
2837 # specified in order for the job to have workers.
2838 { # Describes one particular pool of Cloud Dataflow workers to be
2839 # instantiated by the Cloud Dataflow service in order to perform the
2840 # computations required by a job. Note that a workflow job may use
2841 # multiple pools, in order to match the various computational
2842 # requirements of the various stages of the job.
2843 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
2844 # Compute Engine API.
2845 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
2846 # only be set in the Fn API path. For non-cross-language pipelines this
2847 # should have only one entry. Cross-language pipelines will have two or more
2848 # entries.
2849 { # Defines a SDK harness container for executing Dataflow pipelines.
2850 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
2851 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
2852 # container instance with this image. If false (or unset) recommends using
2853 # more than one core per SDK container instance with this image for
2854 # efficiency. Note that Dataflow service may choose to override this property
2855 # if needed.
2856 },
2857 ],
2858 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
2859 # will attempt to choose a reasonable default.
2860 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
2861 # are supported.
2862 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
2863 &quot;a_key&quot;: &quot;A String&quot;,
2864 },
2865 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
2866 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
2867 { # Describes the data disk used by a workflow job.
2868 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
2869 # attempt to choose a reasonable default.
2870 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
2871 # must be a disk type appropriate to the project and zone in which
2872 # the workers will run. If unknown or unspecified, the service
2873 # will attempt to choose a reasonable default.
2874 #
2875 # For example, the standard persistent disk type is a resource name
2876 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
2877 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
2878 # actual valid values are defined the Google Compute Engine API,
2879 # not by the Cloud Dataflow API; consult the Google Compute Engine
2880 # documentation for more information about determining the set of
2881 # available disk types for a particular project and zone.
2882 #
2883 # Google Compute Engine Disk types are local to a particular
2884 # project in a particular zone, and so the resource name will
2885 # typically look something like this:
2886 #
2887 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
2888 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
2889 },
2890 ],
2891 &quot;packages&quot;: [ # Packages to be installed on workers.
2892 { # The packages that must be installed in order for a worker to run the
2893 # steps of the Cloud Dataflow job that will be assigned to its worker
2894 # pool.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002895 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002896 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2897 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2898 # might use this to install jars containing the user&#x27;s code and all of the
2899 # various dependencies (libraries, data files, etc.) required in order
2900 # for that code to run.
2901 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
2902 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
2903 #
2904 # Google Cloud Storage:
2905 #
2906 # storage.googleapis.com/{bucket}
2907 # bucket.storage.googleapis.com/
2908 },
2909 ],
2910 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
2911 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2912 # `TEARDOWN_NEVER`.
2913 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2914 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2915 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2916 # down.
2917 #
2918 # If the workers are not torn down by the service, they will
2919 # continue to run and use Google Compute Engine VM resources in the
2920 # user&#x27;s project until they are explicitly terminated by the user.
2921 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2922 # policy except for small, manually supervised test jobs.
2923 #
2924 # If unknown or unspecified, the service will attempt to choose a reasonable
2925 # default.
2926 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
2927 # the service will use the network &quot;default&quot;.
2928 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
2929 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2930 # attempt to choose a reasonable default.
2931 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2932 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
2933 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
2934 },
2935 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
2936 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
2937 },
2938 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2939 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
2940 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
2941 # execute the job. If zero or unspecified, the service will
2942 # attempt to choose a reasonable default.
2943 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
2944 # service will choose a number of threads (according to the number of cores
2945 # on the selected machine type for batch, or 1 by convention for streaming).
2946 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
2947 # harness, residing in Google Container Registry.
2948 #
2949 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2950 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2951 # using the standard Dataflow task runner. Users should ignore
2952 # this field.
2953 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
2954 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
2955 # access the Cloud Dataflow API.
2956 &quot;A String&quot;,
2957 ],
2958 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07002959 #
2960 # When workers access Google Cloud APIs, they logically do so via
2961 # relative URLs. If this field is specified, it supplies the base
2962 # URL to use for resolving these relative URLs. The normative
2963 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2964 # Locators&quot;.
2965 #
2966 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -07002967 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
2968 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2969 # console.
2970 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
2971 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
2972 # taskrunner; e.g. &quot;root&quot;.
2973 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
2974 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
2975 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2976 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
2977 # &quot;shuffle/v1beta1&quot;.
2978 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
2979 # storage.
2980 #
2981 # The supported resource type is:
2982 #
2983 # Google Cloud Storage:
2984 #
2985 # storage.googleapis.com/{bucket}/{object}
2986 # bucket.storage.googleapis.com/{object}
2987 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
2988 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
2989 # &quot;dataflow/v1b3/projects&quot;.
2990 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
2991 #
2992 # When workers access Google Cloud APIs, they logically do so via
2993 # relative URLs. If this field is specified, it supplies the base
2994 # URL to use for resolving these relative URLs. The normative
2995 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
2996 # Locators&quot;.
2997 #
2998 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
2999 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
3000 },
3001 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
3002 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
3003 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
3004 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
3005 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
3006 # taskrunner; e.g. &quot;wheel&quot;.
3007 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
3008 # will not be uploaded.
3009 #
3010 # The supported resource type is:
3011 #
3012 # Google Cloud Storage:
3013 # storage.googleapis.com/{bucket}/{object}
3014 # bucket.storage.googleapis.com/{object}
3015 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
3016 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
3017 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
3018 # temporary storage.
3019 #
3020 # The supported resource type is:
3021 #
3022 # Google Cloud Storage:
3023 # storage.googleapis.com/{bucket}/{object}
3024 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003025 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003026 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
3027 # attempt to choose a reasonable default.
3028 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
3029 # select a default set of packages which are useful to worker
3030 # harnesses written in a particular language.
3031 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
3032 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003033 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003034 ],
3035 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
3036 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
3037 # this resource prefix, where {JOBNAME} is the value of the
3038 # job_name field. The resulting bucket and object prefix is used
3039 # as the prefix of the resources used to store temporary data
3040 # needed during the job execution. NOTE: This will override the
3041 # value in taskrunner_settings.
3042 # The supported resource type is:
3043 #
3044 # Google Cloud Storage:
3045 #
3046 # storage.googleapis.com/{bucket}/{object}
3047 # bucket.storage.googleapis.com/{object}
3048 &quot;internalExperiments&quot;: { # Experimental settings.
3049 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003050 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003051 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
3052 # options are passed through the service and are used to recreate the
3053 # SDK pipeline options on the worker in a language agnostic and platform
3054 # independent way.
Bu Sun Kim65020912020-05-20 12:08:20 -07003055 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003056 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003057 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
3058 # related tables are stored.
3059 #
3060 # The supported resource type is:
3061 #
3062 # Google BigQuery:
3063 # bigquery.googleapis.com/{dataset}
3064 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
3065 # unspecified, the service will attempt to choose a reasonable
3066 # default. This should be in the form of the API service name,
3067 # e.g. &quot;compute.googleapis.com&quot;.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003068 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003069 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
3070 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
3071 #
3072 # The top-level steps that constitute the entire job.
3073 { # Defines a particular step within a Cloud Dataflow job.
3074 #
3075 # A job consists of multiple steps, each of which performs some
3076 # specific operation as part of the overall job. Data is typically
3077 # passed from one step to another as part of the job.
3078 #
3079 # Here&#x27;s an example of a sequence of steps which together implement a
3080 # Map-Reduce job:
3081 #
3082 # * Read a collection of data from some source, parsing the
3083 # collection&#x27;s elements.
3084 #
3085 # * Validate the elements.
3086 #
3087 # * Apply a user-defined function to map each element to some value
3088 # and extract an element-specific key value.
3089 #
3090 # * Group elements with the same key into a single element with
3091 # that key, transforming a multiply-keyed collection into a
3092 # uniquely-keyed collection.
3093 #
3094 # * Write the elements out to some data sink.
3095 #
3096 # Note that the Cloud Dataflow service may be used to run many different
3097 # types of jobs, not just Map-Reduce.
3098 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
3099 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
3100 # predefined step has its own required set of properties.
3101 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3102 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
3103 },
3104 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
3105 # step with respect to all other steps in the Cloud Dataflow job.
3106 },
3107 ],
3108 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
3109 # callers cannot mutate it.
3110 { # A message describing the state of a particular execution stage.
3111 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
3112 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
3113 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
3114 },
3115 ],
3116 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
3117 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3118 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3119 # by the metadata values provided here. Populated for ListJobs and all GetJob
3120 # views SUMMARY and higher.
3121 # ListJob response and Job SUMMARY view.
3122 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
3123 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
3124 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
3125 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
3126 },
3127 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
3128 { # Metadata for a BigTable connector used by the job.
3129 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
3130 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
3131 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3132 },
3133 ],
3134 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
3135 { # Metadata for a PubSub connector used by the job.
3136 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
3137 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
3138 },
3139 ],
3140 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
3141 { # Metadata for a BigQuery connector used by the job.
3142 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
3143 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
3144 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
3145 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
3146 },
3147 ],
3148 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
3149 { # Metadata for a File connector used by the job.
3150 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
3151 },
3152 ],
3153 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
3154 { # Metadata for a Datastore connector used by the job.
3155 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
3156 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3157 },
3158 ],
3159 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
3160 { # Metadata for a Spanner connector used by the job.
3161 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
3162 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
3163 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3164 },
3165 ],
3166 },
3167 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
3168 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3169 # contains this job.
3170 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
3171 # corresponding name prefixes of the new job.
3172 &quot;a_key&quot;: &quot;A String&quot;,
3173 },
3174 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3175 # Flexible resource scheduling jobs are started with some delay after job
3176 # creation, so start_time is unset before start and is updated when the
3177 # job is started by the Cloud Dataflow service. For other jobs, start_time
3178 # always equals to create_time and is immutable and set by the Cloud Dataflow
3179 # service.
3180 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
3181 # If this field is set, the service will ensure its uniqueness.
3182 # The request to create a job will fail if the service has knowledge of a
3183 # previously submitted job with the same client&#x27;s ID and job name.
3184 # The caller may use this field to ensure idempotence of job
3185 # creation across retried attempts to create a job.
3186 # By default, the field is empty and, in that case, the service ignores it.
3187 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3188 # isn&#x27;t contained in the submitted job.
3189 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
3190 &quot;a_key&quot;: { # Contains information about how a particular
3191 # google.dataflow.v1beta3.Step will be executed.
3192 &quot;stepName&quot;: [ # The steps associated with the execution stage.
3193 # Note that stages may have several steps, and that a given step
3194 # might be run by more than one stage.
3195 &quot;A String&quot;,
3196 ],
3197 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003198 },
3199 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003200 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
3201 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
3202 # Cloud Dataflow service.
3203 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
3204 # for temporary storage. These temporary files will be
3205 # removed on job completion.
3206 # No duplicates are allowed.
3207 # No file patterns are supported.
3208 #
3209 # The supported files are:
3210 #
3211 # Google Cloud Storage:
3212 #
3213 # storage.googleapis.com/{bucket}/{object}
3214 # bucket.storage.googleapis.com/{object}
3215 &quot;A String&quot;,
3216 ],
3217 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
3218 #
3219 # This field is set by the Cloud Dataflow service when the Job is
3220 # created, and is immutable for the life of the job.
3221 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
3222 #
3223 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3224 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3225 # also be used to directly set a job&#x27;s requested state to
3226 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3227 # job if it has not already reached a terminal state.
3228 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
3229 # of the job it replaced.
3230 #
3231 # When sending a `CreateJobRequest`, you can update a job by specifying it
3232 # here. The job named here is stopped, and its intermediate state is
3233 # transferred to this job.
3234 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
3235 # snapshot.
3236 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
3237 #
3238 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3239 # specified.
3240 #
3241 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3242 # terminal state. After a job has reached a terminal state, no
3243 # further state updates may be made.
3244 #
3245 # This field may be mutated by the Cloud Dataflow service;
3246 # callers cannot mutate it.
3247 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
3248 #
3249 # Only one Job with a given name may exist in a project at any
3250 # given time. If a caller attempts to create a Job with the same
3251 # name as an already-existing Job, the attempt returns the
3252 # existing Job.
3253 #
3254 # The name must match the regular expression
3255 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3256 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
Bu Sun Kim65020912020-05-20 12:08:20 -07003257 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003258 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003259 &quot;nextPageToken&quot;: &quot;A String&quot;, # Set if there may be more results than fit in this response.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003260 &quot;failedLocation&quot;: [ # Zero or more messages describing the [regional endpoints]
3261 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3262 # failed to respond.
3263 { # Indicates which [regional endpoint]
3264 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
3265 # to respond to a request for data.
3266 &quot;name&quot;: &quot;A String&quot;, # The name of the [regional endpoint]
3267 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3268 # failed to respond.
3269 },
3270 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003271 }</pre>
3272</div>
3273
3274<div class="method">
3275 <code class="details" id="list_next">list_next(previous_request, previous_response)</code>
3276 <pre>Retrieves the next page of results.
3277
3278Args:
3279 previous_request: The request for the previous page. (required)
3280 previous_response: The response from the request for the previous page. (required)
3281
3282Returns:
Bu Sun Kim65020912020-05-20 12:08:20 -07003283 A request object that you can call &#x27;execute()&#x27; on to request the next
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003284 page. Returns None if there are no more items in the collection.
3285 </pre>
3286</div>
3287
3288<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07003289 <code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003290 <pre>Snapshot the state of a streaming job.
3291
3292Args:
3293 projectId: string, The project which owns the job to be snapshotted. (required)
3294 jobId: string, The job to be snapshotted. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07003295 body: object, The request body.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003296 The object takes the form of:
3297
3298{ # Request to create a snapshot of a job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003299 &quot;snapshotSources&quot;: True or False, # If true, perform snapshots for sources which support this.
Bu Sun Kim65020912020-05-20 12:08:20 -07003300 &quot;location&quot;: &quot;A String&quot;, # The location that contains this job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003301 &quot;description&quot;: &quot;A String&quot;, # User specified description of the snapshot. Maybe empty.
3302 &quot;ttl&quot;: &quot;A String&quot;, # TTL for the snapshot.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003303 }
3304
3305 x__xgafv: string, V1 error format.
3306 Allowed values
3307 1 - v1 error format
3308 2 - v2 error format
3309
3310Returns:
3311 An object of the form:
3312
3313 { # Represents a snapshot of a job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003314 &quot;ttl&quot;: &quot;A String&quot;, # The time after which this snapshot will be automatically deleted.
3315 &quot;state&quot;: &quot;A String&quot;, # State of the snapshot.
3316 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this snapshot.
3317 &quot;sourceJobId&quot;: &quot;A String&quot;, # The job this snapshot was created from.
3318 &quot;creationTime&quot;: &quot;A String&quot;, # The time this snapshot was created.
3319 &quot;description&quot;: &quot;A String&quot;, # User specified description of the snapshot. Maybe empty.
Bu Sun Kim65020912020-05-20 12:08:20 -07003320 &quot;pubsubMetadata&quot;: [ # PubSub snapshot metadata.
Dan O'Mearadd494642020-05-01 07:42:23 -07003321 { # Represents a Pubsub snapshot.
Bu Sun Kim65020912020-05-20 12:08:20 -07003322 &quot;snapshotName&quot;: &quot;A String&quot;, # The name of the Pubsub snapshot.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003323 &quot;expireTime&quot;: &quot;A String&quot;, # The expire time of the Pubsub snapshot.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003324 &quot;topicName&quot;: &quot;A String&quot;, # The name of the Pubsub topic.
Dan O'Mearadd494642020-05-01 07:42:23 -07003325 },
3326 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003327 &quot;projectId&quot;: &quot;A String&quot;, # The project this snapshot belongs to.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003328 &quot;diskSizeBytes&quot;: &quot;A String&quot;, # The disk byte size of the snapshot. Only available for snapshots in READY
3329 # state.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003330 }</pre>
3331</div>
3332
3333<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07003334 <code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003335 <pre>Updates the state of an existing Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003336
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003337To update the state of an existing job, we recommend using
3338`projects.locations.jobs.update` with a [regional endpoint]
3339(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
3340`projects.jobs.update` is not recommended, as you can only update the state
3341of jobs that are running in `us-central1`.
3342
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003343Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003344 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
3345 jobId: string, The job ID. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07003346 body: object, The request body.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003347 The object takes the form of:
3348
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003349{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003350 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3351 # A description of the user pipeline and stages through which it is executed.
3352 # Created by Cloud Dataflow service. Only retrieved with
3353 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3354 # form. This data is provided by the Dataflow service for ease of visualizing
3355 # the pipeline and interpreting Dataflow provided metrics.
3356 &quot;displayData&quot;: [ # Pipeline level display data.
3357 { # Data provided with a pipeline or transform to provide descriptive info.
3358 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3359 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3360 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3361 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3362 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3363 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3364 # This is intended to be used as a label for the display data
3365 # when viewed in a dax monitoring system.
3366 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3367 # language namespace (i.e. python module) which defines the display data.
3368 # This allows a dax monitoring system to specially handle the data
3369 # and perform custom rendering.
3370 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3371 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3372 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3373 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3374 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3375 # For example a java_class_name_value of com.mypackage.MyDoFn
3376 # will be stored with MyDoFn as the short_str_value and
3377 # com.mypackage.MyDoFn as the java_class_name value.
3378 # short_str_value can be displayed and java_class_name_value
3379 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003380 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003381 ],
3382 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
3383 { # Description of the type, names/ids, and input/outputs for a transform.
3384 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003385 &quot;A String&quot;,
3386 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003387 &quot;displayData&quot;: [ # Transform-specific display data.
3388 { # Data provided with a pipeline or transform to provide descriptive info.
3389 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3390 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3391 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3392 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3393 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3394 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3395 # This is intended to be used as a label for the display data
3396 # when viewed in a dax monitoring system.
3397 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3398 # language namespace (i.e. python module) which defines the display data.
3399 # This allows a dax monitoring system to specially handle the data
3400 # and perform custom rendering.
3401 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3402 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3403 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3404 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3405 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3406 # For example a java_class_name_value of com.mypackage.MyDoFn
3407 # will be stored with MyDoFn as the short_str_value and
3408 # com.mypackage.MyDoFn as the java_class_name value.
3409 # short_str_value can be displayed and java_class_name_value
3410 # will be displayed as a tooltip.
3411 },
3412 ],
3413 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
3414 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
3415 &quot;A String&quot;,
3416 ],
3417 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
3418 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003419 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003420 ],
3421 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
3422 { # Description of the composing transforms, names/ids, and input/outputs of a
3423 # stage of execution. Some composing transforms and sources may have been
3424 # generated by the Dataflow service during execution planning.
3425 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
3426 { # Description of an interstitial value between transforms in an execution
3427 # stage.
3428 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
3429 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3430 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3431 # source is most closely associated.
3432 },
3433 ],
3434 &quot;inputSource&quot;: [ # Input sources for this stage.
3435 { # Description of an input or output of an execution stage.
3436 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
3437 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3438 # source is most closely associated.
3439 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
3440 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3441 },
3442 ],
3443 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
3444 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
3445 { # Description of a transform executed as part of an execution stage.
3446 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3447 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
3448 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
3449 # most closely associated.
3450 },
3451 ],
3452 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
3453 &quot;outputSource&quot;: [ # Output sources for this stage.
3454 { # Description of an input or output of an execution stage.
3455 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
3456 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
3457 # source is most closely associated.
3458 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
3459 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
3460 },
3461 ],
3462 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003463 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003464 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003465 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003466 &quot;labels&quot;: { # User-defined labels for this job.
3467 #
3468 # The labels map can contain no more than 64 entries. Entries of the labels
3469 # map are UTF8 strings that comply with the following restrictions:
3470 #
3471 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3472 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
3473 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
3474 # size.
Bu Sun Kim65020912020-05-20 12:08:20 -07003475 &quot;a_key&quot;: &quot;A String&quot;,
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003476 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003477 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
Bu Sun Kim65020912020-05-20 12:08:20 -07003478 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003479 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
Bu Sun Kim65020912020-05-20 12:08:20 -07003480 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
3481 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3482 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
3483 # with worker_zone. If neither worker_region nor worker_zone is specified,
3484 # default to the control plane&#x27;s region.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003485 &quot;userAgent&quot;: { # A description of the process that generated the request.
3486 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
3487 },
3488 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
3489 &quot;version&quot;: { # A structure describing which components and their versions of the service
3490 # are required in order to run the job.
3491 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
3492 },
Bu Sun Kim65020912020-05-20 12:08:20 -07003493 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
3494 # at rest, AKA a Customer Managed Encryption Key (CMEK).
3495 #
3496 # Format:
3497 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003498 &quot;experiments&quot;: [ # The list of experiments to enable.
3499 &quot;A String&quot;,
3500 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003501 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
3502 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3503 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
3504 # with worker_region. If neither worker_region nor worker_zone is specified,
3505 # a zone in the control plane&#x27;s region is chosen based on available capacity.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003506 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
3507 # specified in order for the job to have workers.
3508 { # Describes one particular pool of Cloud Dataflow workers to be
3509 # instantiated by the Cloud Dataflow service in order to perform the
3510 # computations required by a job. Note that a workflow job may use
3511 # multiple pools, in order to match the various computational
3512 # requirements of the various stages of the job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003513 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
3514 # Compute Engine API.
3515 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
3516 # only be set in the Fn API path. For non-cross-language pipelines this
3517 # should have only one entry. Cross-language pipelines will have two or more
3518 # entries.
3519 { # Defines a SDK harness container for executing Dataflow pipelines.
3520 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
3521 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
3522 # container instance with this image. If false (or unset) recommends using
3523 # more than one core per SDK container instance with this image for
3524 # efficiency. Note that Dataflow service may choose to override this property
3525 # if needed.
3526 },
3527 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003528 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
3529 # will attempt to choose a reasonable default.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003530 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
3531 # are supported.
3532 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
3533 &quot;a_key&quot;: &quot;A String&quot;,
3534 },
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003535 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003536 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
3537 { # Describes the data disk used by a workflow job.
3538 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
3539 # attempt to choose a reasonable default.
3540 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
3541 # must be a disk type appropriate to the project and zone in which
3542 # the workers will run. If unknown or unspecified, the service
3543 # will attempt to choose a reasonable default.
3544 #
3545 # For example, the standard persistent disk type is a resource name
3546 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
3547 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
3548 # actual valid values are defined the Google Compute Engine API,
3549 # not by the Cloud Dataflow API; consult the Google Compute Engine
3550 # documentation for more information about determining the set of
3551 # available disk types for a particular project and zone.
3552 #
3553 # Google Compute Engine Disk types are local to a particular
3554 # project in a particular zone, and so the resource name will
3555 # typically look something like this:
3556 #
3557 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
3558 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
3559 },
3560 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003561 &quot;packages&quot;: [ # Packages to be installed on workers.
3562 { # The packages that must be installed in order for a worker to run the
3563 # steps of the Cloud Dataflow job that will be assigned to its worker
3564 # pool.
3565 #
3566 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3567 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3568 # might use this to install jars containing the user&#x27;s code and all of the
3569 # various dependencies (libraries, data files, etc.) required in order
3570 # for that code to run.
3571 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
3572 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
3573 #
3574 # Google Cloud Storage:
3575 #
3576 # storage.googleapis.com/{bucket}
3577 # bucket.storage.googleapis.com/
3578 },
3579 ],
3580 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
3581 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3582 # `TEARDOWN_NEVER`.
3583 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3584 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3585 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3586 # down.
3587 #
3588 # If the workers are not torn down by the service, they will
3589 # continue to run and use Google Compute Engine VM resources in the
3590 # user&#x27;s project until they are explicitly terminated by the user.
3591 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3592 # policy except for small, manually supervised test jobs.
3593 #
3594 # If unknown or unspecified, the service will attempt to choose a reasonable
3595 # default.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003596 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
3597 # the service will use the network &quot;default&quot;.
3598 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
3599 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3600 # attempt to choose a reasonable default.
3601 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3602 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
3603 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
3604 },
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003605 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
3606 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
3607 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003608 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3609 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
3610 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
3611 # execute the job. If zero or unspecified, the service will
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003612 # attempt to choose a reasonable default.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003613 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
3614 # service will choose a number of threads (according to the number of cores
3615 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003616 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
3617 # harness, residing in Google Container Registry.
3618 #
3619 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003620 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
3621 # using the standard Dataflow task runner. Users should ignore
3622 # this field.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003623 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003624 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
3625 # access the Cloud Dataflow API.
3626 &quot;A String&quot;,
3627 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003628 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3629 #
3630 # When workers access Google Cloud APIs, they logically do so via
3631 # relative URLs. If this field is specified, it supplies the base
3632 # URL to use for resolving these relative URLs. The normative
3633 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
3634 # Locators&quot;.
3635 #
3636 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003637 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003638 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3639 # console.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003640 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
3641 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
3642 # taskrunner; e.g. &quot;root&quot;.
3643 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
3644 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003645 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003646 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
3647 # &quot;shuffle/v1beta1&quot;.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003648 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
3649 # storage.
3650 #
3651 # The supported resource type is:
3652 #
3653 # Google Cloud Storage:
3654 #
3655 # storage.googleapis.com/{bucket}/{object}
3656 # bucket.storage.googleapis.com/{object}
3657 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003658 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
3659 # &quot;dataflow/v1b3/projects&quot;.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003660 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
3661 #
3662 # When workers access Google Cloud APIs, they logically do so via
3663 # relative URLs. If this field is specified, it supplies the base
3664 # URL to use for resolving these relative URLs. The normative
3665 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
3666 # Locators&quot;.
3667 #
3668 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003669 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
3670 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003671 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
3672 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
3673 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
3674 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
3675 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
3676 # taskrunner; e.g. &quot;wheel&quot;.
3677 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
3678 # will not be uploaded.
3679 #
3680 # The supported resource type is:
3681 #
3682 # Google Cloud Storage:
3683 # storage.googleapis.com/{bucket}/{object}
3684 # bucket.storage.googleapis.com/{object}
3685 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
3686 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
3687 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
3688 # temporary storage.
3689 #
3690 # The supported resource type is:
3691 #
3692 # Google Cloud Storage:
3693 # storage.googleapis.com/{bucket}/{object}
3694 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003695 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003696 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
3697 # attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003698 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
3699 # select a default set of packages which are useful to worker
3700 # harnesses written in a particular language.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003701 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
3702 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003703 },
3704 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003705 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
3706 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
3707 # this resource prefix, where {JOBNAME} is the value of the
3708 # job_name field. The resulting bucket and object prefix is used
3709 # as the prefix of the resources used to store temporary data
3710 # needed during the job execution. NOTE: This will override the
3711 # value in taskrunner_settings.
3712 # The supported resource type is:
3713 #
3714 # Google Cloud Storage:
3715 #
3716 # storage.googleapis.com/{bucket}/{object}
3717 # bucket.storage.googleapis.com/{object}
3718 &quot;internalExperiments&quot;: { # Experimental settings.
3719 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
3720 },
3721 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
3722 # options are passed through the service and are used to recreate the
3723 # SDK pipeline options on the worker in a language agnostic and platform
3724 # independent way.
3725 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
3726 },
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -07003727 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
3728 # related tables are stored.
3729 #
3730 # The supported resource type is:
3731 #
3732 # Google BigQuery:
3733 # bigquery.googleapis.com/{dataset}
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003734 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
3735 # unspecified, the service will attempt to choose a reasonable
3736 # default. This should be in the form of the API service name,
3737 # e.g. &quot;compute.googleapis.com&quot;.
Takashi Matsuo06694102015-09-11 13:55:40 -07003738 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003739 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
Bu Sun Kim65020912020-05-20 12:08:20 -07003740 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003741 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003742 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003743 { # Defines a particular step within a Cloud Dataflow job.
3744 #
3745 # A job consists of multiple steps, each of which performs some
3746 # specific operation as part of the overall job. Data is typically
3747 # passed from one step to another as part of the job.
3748 #
Bu Sun Kim65020912020-05-20 12:08:20 -07003749 # Here&#x27;s an example of a sequence of steps which together implement a
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003750 # Map-Reduce job:
3751 #
3752 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -07003753 # collection&#x27;s elements.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003754 #
3755 # * Validate the elements.
3756 #
3757 # * Apply a user-defined function to map each element to some value
3758 # and extract an element-specific key value.
3759 #
3760 # * Group elements with the same key into a single element with
3761 # that key, transforming a multiply-keyed collection into a
3762 # uniquely-keyed collection.
3763 #
3764 # * Write the elements out to some data sink.
3765 #
3766 # Note that the Cloud Dataflow service may be used to run many different
3767 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -07003768 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
3769 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003770 # predefined step has its own required set of properties.
3771 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -07003772 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Takashi Matsuo06694102015-09-11 13:55:40 -07003773 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003774 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
3775 # step with respect to all other steps in the Cloud Dataflow job.
3776 },
3777 ],
3778 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
3779 # callers cannot mutate it.
3780 { # A message describing the state of a particular execution stage.
3781 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
3782 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
3783 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
Takashi Matsuo06694102015-09-11 13:55:40 -07003784 },
3785 ],
Bu Sun Kim65020912020-05-20 12:08:20 -07003786 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
3787 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003788 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3789 # by the metadata values provided here. Populated for ListJobs and all GetJob
3790 # views SUMMARY and higher.
3791 # ListJob response and Job SUMMARY view.
3792 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
3793 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
3794 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
3795 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
3796 },
3797 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
3798 { # Metadata for a BigTable connector used by the job.
3799 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
3800 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
3801 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3802 },
3803 ],
3804 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
3805 { # Metadata for a PubSub connector used by the job.
3806 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
3807 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
3808 },
3809 ],
3810 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
3811 { # Metadata for a BigQuery connector used by the job.
3812 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
3813 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
3814 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
3815 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
3816 },
3817 ],
3818 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
3819 { # Metadata for a File connector used by the job.
3820 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
3821 },
3822 ],
3823 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
3824 { # Metadata for a Datastore connector used by the job.
3825 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
3826 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3827 },
3828 ],
3829 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
3830 { # Metadata for a Spanner connector used by the job.
3831 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
3832 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
3833 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
3834 },
3835 ],
3836 },
3837 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
3838 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3839 # contains this job.
3840 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
3841 # corresponding name prefixes of the new job.
3842 &quot;a_key&quot;: &quot;A String&quot;,
3843 },
3844 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3845 # Flexible resource scheduling jobs are started with some delay after job
3846 # creation, so start_time is unset before start and is updated when the
3847 # job is started by the Cloud Dataflow service. For other jobs, start_time
3848 # always equals to create_time and is immutable and set by the Cloud Dataflow
3849 # service.
3850 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
3851 # If this field is set, the service will ensure its uniqueness.
3852 # The request to create a job will fail if the service has knowledge of a
3853 # previously submitted job with the same client&#x27;s ID and job name.
3854 # The caller may use this field to ensure idempotence of job
3855 # creation across retried attempts to create a job.
3856 # By default, the field is empty and, in that case, the service ignores it.
Bu Sun Kim65020912020-05-20 12:08:20 -07003857 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3858 # isn&#x27;t contained in the submitted job.
3859 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
3860 &quot;a_key&quot;: { # Contains information about how a particular
3861 # google.dataflow.v1beta3.Step will be executed.
3862 &quot;stepName&quot;: [ # The steps associated with the execution stage.
3863 # Note that stages may have several steps, and that a given step
3864 # might be run by more than one stage.
3865 &quot;A String&quot;,
3866 ],
3867 },
3868 },
3869 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003870 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003871 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
3872 # Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003873 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
3874 # for temporary storage. These temporary files will be
3875 # removed on job completion.
3876 # No duplicates are allowed.
3877 # No file patterns are supported.
3878 #
3879 # The supported files are:
3880 #
3881 # Google Cloud Storage:
3882 #
3883 # storage.googleapis.com/{bucket}/{object}
3884 # bucket.storage.googleapis.com/{object}
3885 &quot;A String&quot;,
3886 ],
3887 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
3888 #
3889 # This field is set by the Cloud Dataflow service when the Job is
3890 # created, and is immutable for the life of the job.
Bu Sun Kim65020912020-05-20 12:08:20 -07003891 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003892 #
Bu Sun Kim65020912020-05-20 12:08:20 -07003893 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3894 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3895 # also be used to directly set a job&#x27;s requested state to
3896 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3897 # job if it has not already reached a terminal state.
Bu Sun Kimd059ad82020-07-22 17:02:09 -07003898 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
3899 # of the job it replaced.
3900 #
3901 # When sending a `CreateJobRequest`, you can update a job by specifying it
3902 # here. The job named here is stopped, and its intermediate state is
3903 # transferred to this job.
3904 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
3905 # snapshot.
3906 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
3907 #
3908 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3909 # specified.
3910 #
3911 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3912 # terminal state. After a job has reached a terminal state, no
3913 # further state updates may be made.
3914 #
3915 # This field may be mutated by the Cloud Dataflow service;
3916 # callers cannot mutate it.
3917 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
3918 #
3919 # Only one Job with a given name may exist in a project at any
3920 # given time. If a caller attempts to create a Job with the same
3921 # name as an already-existing Job, the attempt returns the
3922 # existing Job.
3923 #
3924 # The name must match the regular expression
3925 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3926 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
3927 }
3928
3929 location: string, The [regional endpoint]
3930(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3931contains this job.
3932 x__xgafv: string, V1 error format.
3933 Allowed values
3934 1 - v1 error format
3935 2 - v2 error format
3936
3937Returns:
3938 An object of the form:
3939
3940 { # Defines a job to be run by the Cloud Dataflow service.
3941 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3942 # A description of the user pipeline and stages through which it is executed.
3943 # Created by Cloud Dataflow service. Only retrieved with
3944 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3945 # form. This data is provided by the Dataflow service for ease of visualizing
3946 # the pipeline and interpreting Dataflow provided metrics.
3947 &quot;displayData&quot;: [ # Pipeline level display data.
3948 { # Data provided with a pipeline or transform to provide descriptive info.
3949 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3950 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3951 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3952 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3953 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3954 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3955 # This is intended to be used as a label for the display data
3956 # when viewed in a dax monitoring system.
3957 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3958 # language namespace (i.e. python module) which defines the display data.
3959 # This allows a dax monitoring system to specially handle the data
3960 # and perform custom rendering.
3961 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3962 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3963 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3964 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3965 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3966 # For example a java_class_name_value of com.mypackage.MyDoFn
3967 # will be stored with MyDoFn as the short_str_value and
3968 # com.mypackage.MyDoFn as the java_class_name value.
3969 # short_str_value can be displayed and java_class_name_value
3970 # will be displayed as a tooltip.
3971 },
3972 ],
3973 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
3974 { # Description of the type, names/ids, and input/outputs for a transform.
3975 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
3976 &quot;A String&quot;,
3977 ],
3978 &quot;displayData&quot;: [ # Transform-specific display data.
3979 { # Data provided with a pipeline or transform to provide descriptive info.
3980 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
3981 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
3982 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
3983 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
3984 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
3985 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
3986 # This is intended to be used as a label for the display data
3987 # when viewed in a dax monitoring system.
3988 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
3989 # language namespace (i.e. python module) which defines the display data.
3990 # This allows a dax monitoring system to specially handle the data
3991 # and perform custom rendering.
3992 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
3993 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
3994 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
3995 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
3996 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
3997 # For example a java_class_name_value of com.mypackage.MyDoFn
3998 # will be stored with MyDoFn as the short_str_value and
3999 # com.mypackage.MyDoFn as the java_class_name value.
4000 # short_str_value can be displayed and java_class_name_value
4001 # will be displayed as a tooltip.
4002 },
4003 ],
4004 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
4005 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
4006 &quot;A String&quot;,
4007 ],
4008 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
4009 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
4010 },
4011 ],
4012 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
4013 { # Description of the composing transforms, names/ids, and input/outputs of a
4014 # stage of execution. Some composing transforms and sources may have been
4015 # generated by the Dataflow service during execution planning.
4016 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
4017 { # Description of an interstitial value between transforms in an execution
4018 # stage.
4019 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
4020 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4021 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
4022 # source is most closely associated.
4023 },
4024 ],
4025 &quot;inputSource&quot;: [ # Input sources for this stage.
4026 { # Description of an input or output of an execution stage.
4027 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
4028 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
4029 # source is most closely associated.
4030 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
4031 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4032 },
4033 ],
4034 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
4035 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
4036 { # Description of a transform executed as part of an execution stage.
4037 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4038 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
4039 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
4040 # most closely associated.
4041 },
4042 ],
4043 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
4044 &quot;outputSource&quot;: [ # Output sources for this stage.
4045 { # Description of an input or output of an execution stage.
4046 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
4047 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
4048 # source is most closely associated.
4049 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
4050 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
4051 },
4052 ],
4053 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
4054 },
4055 ],
4056 },
4057 &quot;labels&quot;: { # User-defined labels for this job.
4058 #
4059 # The labels map can contain no more than 64 entries. Entries of the labels
4060 # map are UTF8 strings that comply with the following restrictions:
4061 #
4062 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
4063 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
4064 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
4065 # size.
4066 &quot;a_key&quot;: &quot;A String&quot;,
4067 },
4068 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
4069 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
4070 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
4071 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
4072 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
4073 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
4074 # with worker_zone. If neither worker_region nor worker_zone is specified,
4075 # default to the control plane&#x27;s region.
4076 &quot;userAgent&quot;: { # A description of the process that generated the request.
4077 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
4078 },
4079 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
4080 &quot;version&quot;: { # A structure describing which components and their versions of the service
4081 # are required in order to run the job.
4082 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
4083 },
4084 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
4085 # at rest, AKA a Customer Managed Encryption Key (CMEK).
4086 #
4087 # Format:
4088 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
4089 &quot;experiments&quot;: [ # The list of experiments to enable.
4090 &quot;A String&quot;,
4091 ],
4092 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
4093 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
4094 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
4095 # with worker_region. If neither worker_region nor worker_zone is specified,
4096 # a zone in the control plane&#x27;s region is chosen based on available capacity.
4097 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
4098 # specified in order for the job to have workers.
4099 { # Describes one particular pool of Cloud Dataflow workers to be
4100 # instantiated by the Cloud Dataflow service in order to perform the
4101 # computations required by a job. Note that a workflow job may use
4102 # multiple pools, in order to match the various computational
4103 # requirements of the various stages of the job.
4104 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
4105 # Compute Engine API.
4106 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
4107 # only be set in the Fn API path. For non-cross-language pipelines this
4108 # should have only one entry. Cross-language pipelines will have two or more
4109 # entries.
4110 { # Defines a SDK harness container for executing Dataflow pipelines.
4111 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
4112 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
4113 # container instance with this image. If false (or unset) recommends using
4114 # more than one core per SDK container instance with this image for
4115 # efficiency. Note that Dataflow service may choose to override this property
4116 # if needed.
4117 },
4118 ],
4119 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
4120 # will attempt to choose a reasonable default.
4121 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
4122 # are supported.
4123 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
4124 &quot;a_key&quot;: &quot;A String&quot;,
4125 },
4126 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
4127 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
4128 { # Describes the data disk used by a workflow job.
4129 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
4130 # attempt to choose a reasonable default.
4131 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
4132 # must be a disk type appropriate to the project and zone in which
4133 # the workers will run. If unknown or unspecified, the service
4134 # will attempt to choose a reasonable default.
4135 #
4136 # For example, the standard persistent disk type is a resource name
4137 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
4138 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
4139 # actual valid values are defined the Google Compute Engine API,
4140 # not by the Cloud Dataflow API; consult the Google Compute Engine
4141 # documentation for more information about determining the set of
4142 # available disk types for a particular project and zone.
4143 #
4144 # Google Compute Engine Disk types are local to a particular
4145 # project in a particular zone, and so the resource name will
4146 # typically look something like this:
4147 #
4148 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
4149 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
4150 },
4151 ],
4152 &quot;packages&quot;: [ # Packages to be installed on workers.
4153 { # The packages that must be installed in order for a worker to run the
4154 # steps of the Cloud Dataflow job that will be assigned to its worker
4155 # pool.
4156 #
4157 # This is the mechanism by which the Cloud Dataflow SDK causes code to
4158 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
4159 # might use this to install jars containing the user&#x27;s code and all of the
4160 # various dependencies (libraries, data files, etc.) required in order
4161 # for that code to run.
4162 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
4163 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
4164 #
4165 # Google Cloud Storage:
4166 #
4167 # storage.googleapis.com/{bucket}
4168 # bucket.storage.googleapis.com/
4169 },
4170 ],
4171 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
4172 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
4173 # `TEARDOWN_NEVER`.
4174 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
4175 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
4176 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
4177 # down.
4178 #
4179 # If the workers are not torn down by the service, they will
4180 # continue to run and use Google Compute Engine VM resources in the
4181 # user&#x27;s project until they are explicitly terminated by the user.
4182 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
4183 # policy except for small, manually supervised test jobs.
4184 #
4185 # If unknown or unspecified, the service will attempt to choose a reasonable
4186 # default.
4187 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
4188 # the service will use the network &quot;default&quot;.
4189 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
4190 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
4191 # attempt to choose a reasonable default.
4192 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
4193 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
4194 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
4195 },
4196 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
4197 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
4198 },
4199 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
4200 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
4201 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
4202 # execute the job. If zero or unspecified, the service will
4203 # attempt to choose a reasonable default.
4204 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
4205 # service will choose a number of threads (according to the number of cores
4206 # on the selected machine type for batch, or 1 by convention for streaming).
4207 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
4208 # harness, residing in Google Container Registry.
4209 #
4210 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
4211 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
4212 # using the standard Dataflow task runner. Users should ignore
4213 # this field.
4214 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
4215 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
4216 # access the Cloud Dataflow API.
4217 &quot;A String&quot;,
4218 ],
4219 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
4220 #
4221 # When workers access Google Cloud APIs, they logically do so via
4222 # relative URLs. If this field is specified, it supplies the base
4223 # URL to use for resolving these relative URLs. The normative
4224 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
4225 # Locators&quot;.
4226 #
4227 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
4228 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
4229 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
4230 # console.
4231 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
4232 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
4233 # taskrunner; e.g. &quot;root&quot;.
4234 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
4235 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
4236 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
4237 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
4238 # &quot;shuffle/v1beta1&quot;.
4239 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
4240 # storage.
4241 #
4242 # The supported resource type is:
4243 #
4244 # Google Cloud Storage:
4245 #
4246 # storage.googleapis.com/{bucket}/{object}
4247 # bucket.storage.googleapis.com/{object}
4248 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
4249 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
4250 # &quot;dataflow/v1b3/projects&quot;.
4251 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
4252 #
4253 # When workers access Google Cloud APIs, they logically do so via
4254 # relative URLs. If this field is specified, it supplies the base
4255 # URL to use for resolving these relative URLs. The normative
4256 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
4257 # Locators&quot;.
4258 #
4259 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
4260 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
4261 },
4262 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
4263 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
4264 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
4265 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
4266 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
4267 # taskrunner; e.g. &quot;wheel&quot;.
4268 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
4269 # will not be uploaded.
4270 #
4271 # The supported resource type is:
4272 #
4273 # Google Cloud Storage:
4274 # storage.googleapis.com/{bucket}/{object}
4275 # bucket.storage.googleapis.com/{object}
4276 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
4277 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
4278 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
4279 # temporary storage.
4280 #
4281 # The supported resource type is:
4282 #
4283 # Google Cloud Storage:
4284 # storage.googleapis.com/{bucket}/{object}
4285 # bucket.storage.googleapis.com/{object}
4286 },
4287 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
4288 # attempt to choose a reasonable default.
4289 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
4290 # select a default set of packages which are useful to worker
4291 # harnesses written in a particular language.
4292 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
4293 # service will attempt to choose a reasonable default.
4294 },
4295 ],
4296 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
4297 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
4298 # this resource prefix, where {JOBNAME} is the value of the
4299 # job_name field. The resulting bucket and object prefix is used
4300 # as the prefix of the resources used to store temporary data
4301 # needed during the job execution. NOTE: This will override the
4302 # value in taskrunner_settings.
4303 # The supported resource type is:
4304 #
4305 # Google Cloud Storage:
4306 #
4307 # storage.googleapis.com/{bucket}/{object}
4308 # bucket.storage.googleapis.com/{object}
4309 &quot;internalExperiments&quot;: { # Experimental settings.
4310 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
4311 },
4312 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
4313 # options are passed through the service and are used to recreate the
4314 # SDK pipeline options on the worker in a language agnostic and platform
4315 # independent way.
4316 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
4317 },
4318 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
4319 # related tables are stored.
4320 #
4321 # The supported resource type is:
4322 #
4323 # Google BigQuery:
4324 # bigquery.googleapis.com/{dataset}
4325 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
4326 # unspecified, the service will attempt to choose a reasonable
4327 # default. This should be in the form of the API service name,
4328 # e.g. &quot;compute.googleapis.com&quot;.
4329 },
4330 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
4331 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
4332 #
4333 # The top-level steps that constitute the entire job.
4334 { # Defines a particular step within a Cloud Dataflow job.
4335 #
4336 # A job consists of multiple steps, each of which performs some
4337 # specific operation as part of the overall job. Data is typically
4338 # passed from one step to another as part of the job.
4339 #
4340 # Here&#x27;s an example of a sequence of steps which together implement a
4341 # Map-Reduce job:
4342 #
4343 # * Read a collection of data from some source, parsing the
4344 # collection&#x27;s elements.
4345 #
4346 # * Validate the elements.
4347 #
4348 # * Apply a user-defined function to map each element to some value
4349 # and extract an element-specific key value.
4350 #
4351 # * Group elements with the same key into a single element with
4352 # that key, transforming a multiply-keyed collection into a
4353 # uniquely-keyed collection.
4354 #
4355 # * Write the elements out to some data sink.
4356 #
4357 # Note that the Cloud Dataflow service may be used to run many different
4358 # types of jobs, not just Map-Reduce.
4359 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
4360 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
4361 # predefined step has its own required set of properties.
4362 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
4363 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
4364 },
4365 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
4366 # step with respect to all other steps in the Cloud Dataflow job.
4367 },
4368 ],
4369 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
4370 # callers cannot mutate it.
4371 { # A message describing the state of a particular execution stage.
4372 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
4373 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
4374 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
4375 },
4376 ],
4377 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
4378 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
4379 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
4380 # by the metadata values provided here. Populated for ListJobs and all GetJob
4381 # views SUMMARY and higher.
4382 # ListJob response and Job SUMMARY view.
4383 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
4384 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
4385 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
4386 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
4387 },
4388 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
4389 { # Metadata for a BigTable connector used by the job.
4390 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
4391 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
4392 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
4393 },
4394 ],
4395 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
4396 { # Metadata for a PubSub connector used by the job.
4397 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
4398 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
4399 },
4400 ],
4401 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
4402 { # Metadata for a BigQuery connector used by the job.
4403 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
4404 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
4405 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
4406 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
4407 },
4408 ],
4409 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
4410 { # Metadata for a File connector used by the job.
4411 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
4412 },
4413 ],
4414 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
4415 { # Metadata for a Datastore connector used by the job.
4416 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
4417 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
4418 },
4419 ],
4420 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
4421 { # Metadata for a Spanner connector used by the job.
4422 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
4423 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
4424 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
4425 },
4426 ],
4427 },
4428 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
4429 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
4430 # contains this job.
4431 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
4432 # corresponding name prefixes of the new job.
4433 &quot;a_key&quot;: &quot;A String&quot;,
4434 },
4435 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
4436 # Flexible resource scheduling jobs are started with some delay after job
4437 # creation, so start_time is unset before start and is updated when the
4438 # job is started by the Cloud Dataflow service. For other jobs, start_time
4439 # always equals to create_time and is immutable and set by the Cloud Dataflow
4440 # service.
4441 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
4442 # If this field is set, the service will ensure its uniqueness.
4443 # The request to create a job will fail if the service has knowledge of a
4444 # previously submitted job with the same client&#x27;s ID and job name.
4445 # The caller may use this field to ensure idempotence of job
4446 # creation across retried attempts to create a job.
4447 # By default, the field is empty and, in that case, the service ignores it.
4448 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
4449 # isn&#x27;t contained in the submitted job.
4450 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
4451 &quot;a_key&quot;: { # Contains information about how a particular
4452 # google.dataflow.v1beta3.Step will be executed.
4453 &quot;stepName&quot;: [ # The steps associated with the execution stage.
4454 # Note that stages may have several steps, and that a given step
4455 # might be run by more than one stage.
4456 &quot;A String&quot;,
4457 ],
4458 },
4459 },
4460 },
4461 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
4462 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
4463 # Cloud Dataflow service.
4464 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
4465 # for temporary storage. These temporary files will be
4466 # removed on job completion.
4467 # No duplicates are allowed.
4468 # No file patterns are supported.
4469 #
4470 # The supported files are:
4471 #
4472 # Google Cloud Storage:
4473 #
4474 # storage.googleapis.com/{bucket}/{object}
4475 # bucket.storage.googleapis.com/{object}
4476 &quot;A String&quot;,
4477 ],
4478 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
4479 #
4480 # This field is set by the Cloud Dataflow service when the Job is
4481 # created, and is immutable for the life of the job.
4482 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
4483 #
4484 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
4485 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
4486 # also be used to directly set a job&#x27;s requested state to
4487 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
4488 # job if it has not already reached a terminal state.
4489 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
4490 # of the job it replaced.
4491 #
4492 # When sending a `CreateJobRequest`, you can update a job by specifying it
4493 # here. The job named here is stopped, and its intermediate state is
4494 # transferred to this job.
4495 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
4496 # snapshot.
4497 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
4498 #
4499 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
4500 # specified.
4501 #
4502 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
4503 # terminal state. After a job has reached a terminal state, no
4504 # further state updates may be made.
4505 #
4506 # This field may be mutated by the Cloud Dataflow service;
4507 # callers cannot mutate it.
4508 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
4509 #
4510 # Only one Job with a given name may exist in a project at any
4511 # given time. If a caller attempts to create a Job with the same
4512 # name as an already-existing Job, the attempt returns the
4513 # existing Job.
4514 #
4515 # The name must match the regular expression
4516 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
4517 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
4518 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004519</div>
4520
4521</body></html>