blob: 68b6bbd31052d9efdef37b0a98b0c1977f31e132 [file] [log] [blame]
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.locations.html">locations</a> . <a href="dataflow_v1b3.projects.locations.templates.html">templates</a></h1>
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040076<h2>Instance Methods</h2>
77<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -070078 <code><a href="#create">create(projectId, location, body=None, x__xgafv=None)</a></code></p>
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040079<p class="firstline">Creates a Cloud Dataflow job from a template.</p>
80<p class="toc_element">
Bu Sun Kimd059ad82020-07-22 17:02:09 -070081 <code><a href="#get">get(projectId, location, gcsPath=None, view=None, x__xgafv=None)</a></code></p>
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040082<p class="firstline">Get the template associated with a template.</p>
83<p class="toc_element">
Bu Sun Kimd059ad82020-07-22 17:02:09 -070084 <code><a href="#launch">launch(projectId, location, body=None, dynamicTemplate_gcsPath=None, dynamicTemplate_stagingLocation=None, validateOnly=None, gcsPath=None, x__xgafv=None)</a></code></p>
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040085<p class="firstline">Launch a template.</p>
86<h3>Method Details</h3>
87<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -070088 <code class="details" id="create">create(projectId, location, body=None, x__xgafv=None)</code>
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040089 <pre>Creates a Cloud Dataflow job from a template.
90
91Args:
92 projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070093 location: string, The [regional endpoint]
94(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
95which to direct the request. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -070096 body: object, The request body.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040097 The object takes the form of:
98
99{ # A request to create a Cloud Dataflow job from a template.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700100 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
101 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
102 # which to direct the request.
Bu Sun Kim65020912020-05-20 12:08:20 -0700103 &quot;environment&quot;: { # The environment values to set at runtime. # The runtime environment for the job.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700104 &quot;bypassTempDirValidation&quot;: True or False, # Whether to bypass the safety checks for the job&#x27;s temporary directory.
105 # Use with caution.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700106 &quot;tempLocation&quot;: &quot;A String&quot;, # The Cloud Storage path to use for temporary files.
107 # Must be a valid Cloud Storage URL, beginning with `gs://`.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700108 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
109 # the service will use the network &quot;default&quot;.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700110 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
111 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
Bu Sun Kim65020912020-05-20 12:08:20 -0700112 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
Dan O'Mearadd494642020-05-01 07:42:23 -0700113 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
Bu Sun Kim65020912020-05-20 12:08:20 -0700114 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
Dan O'Mearadd494642020-05-01 07:42:23 -0700115 # with worker_zone. If neither worker_region nor worker_zone is specified,
Bu Sun Kim65020912020-05-20 12:08:20 -0700116 # default to the control plane&#x27;s region.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700117 &quot;numWorkers&quot;: 42, # The initial number of Google Compute Engine instnaces for the job.
118 &quot;additionalExperiments&quot;: [ # Additional experiment flags for the job.
119 &quot;A String&quot;,
120 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700121 &quot;zone&quot;: &quot;A String&quot;, # The Compute Engine [availability
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400122 # zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)
123 # for launching worker instances to run your pipeline.
Dan O'Mearadd494642020-05-01 07:42:23 -0700124 # In the future, worker_zone will take precedence.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700125 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # The email address of the service account to run the job as.
126 &quot;maxWorkers&quot;: 42, # The maximum number of Google Compute Engine instances to be made
127 # available to your pipeline during execution, from 1 to 1000.
Bu Sun Kim65020912020-05-20 12:08:20 -0700128 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
129 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
130 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
131 # with worker_region. If neither worker_region nor worker_zone is specified,
132 # a zone in the control plane&#x27;s region is chosen based on available capacity.
133 # If both `worker_zone` and `zone` are set, `worker_zone` takes precedence.
134 &quot;additionalUserLabels&quot;: { # Additional user labels to be specified for the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700135 # Keys and values should follow the restrictions specified in the [labeling
136 # restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)
137 # page.
Bu Sun Kim65020912020-05-20 12:08:20 -0700138 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700139 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700140 &quot;machineType&quot;: &quot;A String&quot;, # The machine type to use for the job. Defaults to the value from the
141 # template if not specified.
142 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
143 &quot;kmsKeyName&quot;: &quot;A String&quot;, # Optional. Name for the Cloud KMS key for the job.
144 # Key format is:
145 # projects/&lt;project&gt;/locations/&lt;location&gt;/keyRings/&lt;keyring&gt;/cryptoKeys/&lt;key&gt;
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400146 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700147 &quot;gcsPath&quot;: &quot;A String&quot;, # Required. A Cloud Storage path to the template from which to
148 # create the job.
149 # Must be a valid Cloud Storage URL, beginning with `gs://`.
150 &quot;jobName&quot;: &quot;A String&quot;, # Required. The job name to use for the created job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700151 &quot;parameters&quot;: { # The runtime parameters to pass to the job.
152 &quot;a_key&quot;: &quot;A String&quot;,
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400153 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400154 }
155
156 x__xgafv: string, V1 error format.
157 Allowed values
158 1 - v1 error format
159 2 - v2 error format
160
161Returns:
162 An object of the form:
163
164 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700165 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
166 # A description of the user pipeline and stages through which it is executed.
167 # Created by Cloud Dataflow service. Only retrieved with
168 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
169 # form. This data is provided by the Dataflow service for ease of visualizing
170 # the pipeline and interpreting Dataflow provided metrics.
171 &quot;displayData&quot;: [ # Pipeline level display data.
172 { # Data provided with a pipeline or transform to provide descriptive info.
173 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
174 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
175 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
176 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
177 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
178 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
179 # This is intended to be used as a label for the display data
180 # when viewed in a dax monitoring system.
181 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
182 # language namespace (i.e. python module) which defines the display data.
183 # This allows a dax monitoring system to specially handle the data
184 # and perform custom rendering.
185 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
186 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
187 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
188 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
189 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
190 # For example a java_class_name_value of com.mypackage.MyDoFn
191 # will be stored with MyDoFn as the short_str_value and
192 # com.mypackage.MyDoFn as the java_class_name value.
193 # short_str_value can be displayed and java_class_name_value
194 # will be displayed as a tooltip.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700195 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700196 ],
197 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
198 { # Description of the type, names/ids, and input/outputs for a transform.
199 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700200 &quot;A String&quot;,
201 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700202 &quot;displayData&quot;: [ # Transform-specific display data.
203 { # Data provided with a pipeline or transform to provide descriptive info.
204 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
205 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
206 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
207 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
208 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
209 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
210 # This is intended to be used as a label for the display data
211 # when viewed in a dax monitoring system.
212 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
213 # language namespace (i.e. python module) which defines the display data.
214 # This allows a dax monitoring system to specially handle the data
215 # and perform custom rendering.
216 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
217 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
218 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
219 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
220 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
221 # For example a java_class_name_value of com.mypackage.MyDoFn
222 # will be stored with MyDoFn as the short_str_value and
223 # com.mypackage.MyDoFn as the java_class_name value.
224 # short_str_value can be displayed and java_class_name_value
225 # will be displayed as a tooltip.
226 },
227 ],
228 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
229 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
230 &quot;A String&quot;,
231 ],
232 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
233 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700234 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700235 ],
236 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
237 { # Description of the composing transforms, names/ids, and input/outputs of a
238 # stage of execution. Some composing transforms and sources may have been
239 # generated by the Dataflow service during execution planning.
240 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
241 { # Description of an interstitial value between transforms in an execution
242 # stage.
243 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
244 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
245 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
246 # source is most closely associated.
247 },
248 ],
249 &quot;inputSource&quot;: [ # Input sources for this stage.
250 { # Description of an input or output of an execution stage.
251 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
252 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
253 # source is most closely associated.
254 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
255 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
256 },
257 ],
258 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
259 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
260 { # Description of a transform executed as part of an execution stage.
261 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
262 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
263 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
264 # most closely associated.
265 },
266 ],
267 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
268 &quot;outputSource&quot;: [ # Output sources for this stage.
269 { # Description of an input or output of an execution stage.
270 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
271 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
272 # source is most closely associated.
273 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
274 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
275 },
276 ],
277 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700278 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700279 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700280 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700281 &quot;labels&quot;: { # User-defined labels for this job.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400282 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700283 # The labels map can contain no more than 64 entries. Entries of the labels
284 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700285 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700286 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
287 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
288 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
289 # size.
Bu Sun Kim65020912020-05-20 12:08:20 -0700290 &quot;a_key&quot;: &quot;A String&quot;,
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700291 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700292 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
Bu Sun Kim65020912020-05-20 12:08:20 -0700293 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700294 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
Bu Sun Kim65020912020-05-20 12:08:20 -0700295 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
296 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
297 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
298 # with worker_zone. If neither worker_region nor worker_zone is specified,
299 # default to the control plane&#x27;s region.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700300 &quot;userAgent&quot;: { # A description of the process that generated the request.
301 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
302 },
303 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
304 &quot;version&quot;: { # A structure describing which components and their versions of the service
305 # are required in order to run the job.
306 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
307 },
Bu Sun Kim65020912020-05-20 12:08:20 -0700308 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
309 # at rest, AKA a Customer Managed Encryption Key (CMEK).
310 #
311 # Format:
312 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700313 &quot;experiments&quot;: [ # The list of experiments to enable.
314 &quot;A String&quot;,
315 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700316 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
317 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
318 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
319 # with worker_region. If neither worker_region nor worker_zone is specified,
320 # a zone in the control plane&#x27;s region is chosen based on available capacity.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700321 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
322 # specified in order for the job to have workers.
323 { # Describes one particular pool of Cloud Dataflow workers to be
324 # instantiated by the Cloud Dataflow service in order to perform the
325 # computations required by a job. Note that a workflow job may use
326 # multiple pools, in order to match the various computational
327 # requirements of the various stages of the job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700328 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
329 # Compute Engine API.
330 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
331 # only be set in the Fn API path. For non-cross-language pipelines this
332 # should have only one entry. Cross-language pipelines will have two or more
333 # entries.
334 { # Defines a SDK harness container for executing Dataflow pipelines.
335 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
336 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
337 # container instance with this image. If false (or unset) recommends using
338 # more than one core per SDK container instance with this image for
339 # efficiency. Note that Dataflow service may choose to override this property
340 # if needed.
341 },
342 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700343 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
344 # will attempt to choose a reasonable default.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700345 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
346 # are supported.
347 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
348 &quot;a_key&quot;: &quot;A String&quot;,
349 },
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700350 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700351 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
352 { # Describes the data disk used by a workflow job.
353 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
354 # attempt to choose a reasonable default.
355 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
356 # must be a disk type appropriate to the project and zone in which
357 # the workers will run. If unknown or unspecified, the service
358 # will attempt to choose a reasonable default.
359 #
360 # For example, the standard persistent disk type is a resource name
361 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
362 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
363 # actual valid values are defined the Google Compute Engine API,
364 # not by the Cloud Dataflow API; consult the Google Compute Engine
365 # documentation for more information about determining the set of
366 # available disk types for a particular project and zone.
367 #
368 # Google Compute Engine Disk types are local to a particular
369 # project in a particular zone, and so the resource name will
370 # typically look something like this:
371 #
372 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
373 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
374 },
375 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700376 &quot;packages&quot;: [ # Packages to be installed on workers.
377 { # The packages that must be installed in order for a worker to run the
378 # steps of the Cloud Dataflow job that will be assigned to its worker
379 # pool.
380 #
381 # This is the mechanism by which the Cloud Dataflow SDK causes code to
382 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
383 # might use this to install jars containing the user&#x27;s code and all of the
384 # various dependencies (libraries, data files, etc.) required in order
385 # for that code to run.
386 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
387 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
388 #
389 # Google Cloud Storage:
390 #
391 # storage.googleapis.com/{bucket}
392 # bucket.storage.googleapis.com/
393 },
394 ],
395 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
396 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
397 # `TEARDOWN_NEVER`.
398 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
399 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
400 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
401 # down.
402 #
403 # If the workers are not torn down by the service, they will
404 # continue to run and use Google Compute Engine VM resources in the
405 # user&#x27;s project until they are explicitly terminated by the user.
406 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
407 # policy except for small, manually supervised test jobs.
408 #
409 # If unknown or unspecified, the service will attempt to choose a reasonable
410 # default.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700411 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
412 # the service will use the network &quot;default&quot;.
413 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
414 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
415 # attempt to choose a reasonable default.
416 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
417 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
418 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
419 },
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700420 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
421 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
422 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700423 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
424 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
425 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
426 # execute the job. If zero or unspecified, the service will
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700427 # attempt to choose a reasonable default.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700428 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
429 # service will choose a number of threads (according to the number of cores
430 # on the selected machine type for batch, or 1 by convention for streaming).
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700431 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
432 # harness, residing in Google Container Registry.
433 #
434 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700435 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
436 # using the standard Dataflow task runner. Users should ignore
437 # this field.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700438 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700439 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
440 # access the Cloud Dataflow API.
441 &quot;A String&quot;,
442 ],
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700443 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
444 #
445 # When workers access Google Cloud APIs, they logically do so via
446 # relative URLs. If this field is specified, it supplies the base
447 # URL to use for resolving these relative URLs. The normative
448 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
449 # Locators&quot;.
450 #
451 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700452 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700453 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
454 # console.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700455 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
456 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
457 # taskrunner; e.g. &quot;root&quot;.
458 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
459 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700460 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700461 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
462 # &quot;shuffle/v1beta1&quot;.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700463 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
464 # storage.
465 #
466 # The supported resource type is:
467 #
468 # Google Cloud Storage:
469 #
470 # storage.googleapis.com/{bucket}/{object}
471 # bucket.storage.googleapis.com/{object}
472 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700473 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
474 # &quot;dataflow/v1b3/projects&quot;.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700475 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
476 #
477 # When workers access Google Cloud APIs, they logically do so via
478 # relative URLs. If this field is specified, it supplies the base
479 # URL to use for resolving these relative URLs. The normative
480 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
481 # Locators&quot;.
482 #
483 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700484 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
485 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700486 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
487 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
488 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
489 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
490 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
491 # taskrunner; e.g. &quot;wheel&quot;.
492 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
493 # will not be uploaded.
494 #
495 # The supported resource type is:
496 #
497 # Google Cloud Storage:
498 # storage.googleapis.com/{bucket}/{object}
499 # bucket.storage.googleapis.com/{object}
500 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
501 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
502 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
503 # temporary storage.
504 #
505 # The supported resource type is:
506 #
507 # Google Cloud Storage:
508 # storage.googleapis.com/{bucket}/{object}
509 # bucket.storage.googleapis.com/{object}
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700510 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700511 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
512 # attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700513 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
514 # select a default set of packages which are useful to worker
515 # harnesses written in a particular language.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700516 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
517 # service will attempt to choose a reasonable default.
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700518 },
519 ],
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700520 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
521 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
522 # this resource prefix, where {JOBNAME} is the value of the
523 # job_name field. The resulting bucket and object prefix is used
524 # as the prefix of the resources used to store temporary data
525 # needed during the job execution. NOTE: This will override the
526 # value in taskrunner_settings.
527 # The supported resource type is:
528 #
529 # Google Cloud Storage:
530 #
531 # storage.googleapis.com/{bucket}/{object}
532 # bucket.storage.googleapis.com/{object}
533 &quot;internalExperiments&quot;: { # Experimental settings.
534 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
535 },
536 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
537 # options are passed through the service and are used to recreate the
538 # SDK pipeline options on the worker in a language agnostic and platform
539 # independent way.
540 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
541 },
Bu Sun Kim4ed7d3f2020-05-27 12:20:54 -0700542 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
543 # related tables are stored.
544 #
545 # The supported resource type is:
546 #
547 # Google BigQuery:
548 # bigquery.googleapis.com/{dataset}
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700549 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
550 # unspecified, the service will attempt to choose a reasonable
551 # default. This should be in the form of the API service name,
552 # e.g. &quot;compute.googleapis.com&quot;.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400553 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700554 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
Bu Sun Kim65020912020-05-20 12:08:20 -0700555 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700556 #
557 # The top-level steps that constitute the entire job.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400558 { # Defines a particular step within a Cloud Dataflow job.
559 #
560 # A job consists of multiple steps, each of which performs some
561 # specific operation as part of the overall job. Data is typically
562 # passed from one step to another as part of the job.
563 #
Bu Sun Kim65020912020-05-20 12:08:20 -0700564 # Here&#x27;s an example of a sequence of steps which together implement a
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400565 # Map-Reduce job:
566 #
567 # * Read a collection of data from some source, parsing the
Bu Sun Kim65020912020-05-20 12:08:20 -0700568 # collection&#x27;s elements.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400569 #
570 # * Validate the elements.
571 #
572 # * Apply a user-defined function to map each element to some value
573 # and extract an element-specific key value.
574 #
575 # * Group elements with the same key into a single element with
576 # that key, transforming a multiply-keyed collection into a
577 # uniquely-keyed collection.
578 #
579 # * Write the elements out to some data sink.
580 #
581 # Note that the Cloud Dataflow service may be used to run many different
582 # types of jobs, not just Map-Reduce.
Bu Sun Kim65020912020-05-20 12:08:20 -0700583 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
584 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400585 # predefined step has its own required set of properties.
586 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Bu Sun Kim65020912020-05-20 12:08:20 -0700587 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400588 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700589 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
590 # step with respect to all other steps in the Cloud Dataflow job.
591 },
592 ],
593 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
594 # callers cannot mutate it.
595 { # A message describing the state of a particular execution stage.
596 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
597 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
598 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400599 },
600 ],
Bu Sun Kim65020912020-05-20 12:08:20 -0700601 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
602 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700603 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
604 # by the metadata values provided here. Populated for ListJobs and all GetJob
605 # views SUMMARY and higher.
606 # ListJob response and Job SUMMARY view.
607 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
608 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
609 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
610 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
611 },
612 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
613 { # Metadata for a BigTable connector used by the job.
614 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
615 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
616 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
617 },
618 ],
619 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
620 { # Metadata for a PubSub connector used by the job.
621 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
622 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
623 },
624 ],
625 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
626 { # Metadata for a BigQuery connector used by the job.
627 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
628 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
629 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
630 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
631 },
632 ],
633 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
634 { # Metadata for a File connector used by the job.
635 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
636 },
637 ],
638 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
639 { # Metadata for a Datastore connector used by the job.
640 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
641 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
642 },
643 ],
644 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
645 { # Metadata for a Spanner connector used by the job.
646 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
647 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
648 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
649 },
650 ],
651 },
652 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
653 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
654 # contains this job.
655 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
656 # corresponding name prefixes of the new job.
657 &quot;a_key&quot;: &quot;A String&quot;,
658 },
659 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
660 # Flexible resource scheduling jobs are started with some delay after job
661 # creation, so start_time is unset before start and is updated when the
662 # job is started by the Cloud Dataflow service. For other jobs, start_time
663 # always equals to create_time and is immutable and set by the Cloud Dataflow
664 # service.
665 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
666 # If this field is set, the service will ensure its uniqueness.
667 # The request to create a job will fail if the service has knowledge of a
668 # previously submitted job with the same client&#x27;s ID and job name.
669 # The caller may use this field to ensure idempotence of job
670 # creation across retried attempts to create a job.
671 # By default, the field is empty and, in that case, the service ignores it.
Bu Sun Kim65020912020-05-20 12:08:20 -0700672 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
673 # isn&#x27;t contained in the submitted job.
674 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
675 &quot;a_key&quot;: { # Contains information about how a particular
676 # google.dataflow.v1beta3.Step will be executed.
677 &quot;stepName&quot;: [ # The steps associated with the execution stage.
678 # Note that stages may have several steps, and that a given step
679 # might be run by more than one stage.
680 &quot;A String&quot;,
681 ],
682 },
683 },
684 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700685 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
686 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
687 # Cloud Dataflow service.
688 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
689 # for temporary storage. These temporary files will be
690 # removed on job completion.
691 # No duplicates are allowed.
692 # No file patterns are supported.
693 #
694 # The supported files are:
695 #
696 # Google Cloud Storage:
697 #
698 # storage.googleapis.com/{bucket}/{object}
699 # bucket.storage.googleapis.com/{object}
700 &quot;A String&quot;,
701 ],
702 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
703 #
704 # This field is set by the Cloud Dataflow service when the Job is
705 # created, and is immutable for the life of the job.
706 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
707 #
708 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
709 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
710 # also be used to directly set a job&#x27;s requested state to
711 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
712 # job if it has not already reached a terminal state.
713 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
714 # of the job it replaced.
715 #
716 # When sending a `CreateJobRequest`, you can update a job by specifying it
717 # here. The job named here is stopped, and its intermediate state is
718 # transferred to this job.
719 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
720 # snapshot.
Bu Sun Kim65020912020-05-20 12:08:20 -0700721 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700722 #
723 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
724 # specified.
725 #
726 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
727 # terminal state. After a job has reached a terminal state, no
728 # further state updates may be made.
729 #
730 # This field may be mutated by the Cloud Dataflow service;
731 # callers cannot mutate it.
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700732 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
Bu Sun Kim65020912020-05-20 12:08:20 -0700733 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700734 # Only one Job with a given name may exist in a project at any
735 # given time. If a caller attempts to create a Job with the same
736 # name as an already-existing Job, the attempt returns the
737 # existing Job.
Bu Sun Kim65020912020-05-20 12:08:20 -0700738 #
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700739 # The name must match the regular expression
740 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
741 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
742 }</pre>
743</div>
744
745<div class="method">
746 <code class="details" id="get">get(projectId, location, gcsPath=None, view=None, x__xgafv=None)</code>
747 <pre>Get the template associated with a template.
748
749Args:
750 projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)
751 location: string, The [regional endpoint]
752(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
753which to direct the request. (required)
754 gcsPath: string, Required. A Cloud Storage path to the template from which to
755create the job.
756Must be valid Cloud Storage URL, beginning with &#x27;gs://&#x27;.
757 view: string, The view to retrieve. Defaults to METADATA_ONLY.
758 x__xgafv: string, V1 error format.
759 Allowed values
760 1 - v1 error format
761 2 - v2 error format
762
763Returns:
764 An object of the form:
765
766 { # The response to a GetTemplate request.
767 &quot;runtimeMetadata&quot;: { # RuntimeMetadata describing a runtime environment. # Describes the runtime metadata with SDKInfo and available parameters.
768 &quot;parameters&quot;: [ # The parameters for the template.
769 { # Metadata for a specific parameter.
770 &quot;label&quot;: &quot;A String&quot;, # Required. The label to display for the parameter.
771 &quot;helpText&quot;: &quot;A String&quot;, # Required. The help text to display for the parameter.
772 &quot;regexes&quot;: [ # Optional. Regexes that the parameter must match.
773 &quot;A String&quot;,
774 ],
775 &quot;paramType&quot;: &quot;A String&quot;, # Optional. The type of the parameter.
776 # Used for selecting input picker.
777 &quot;isOptional&quot;: True or False, # Optional. Whether the parameter is optional. Defaults to false.
778 &quot;name&quot;: &quot;A String&quot;, # Required. The name of the parameter.
779 },
780 ],
781 &quot;sdkInfo&quot;: { # SDK Information. # SDK Info for the template.
782 &quot;language&quot;: &quot;A String&quot;, # Required. The SDK Language.
783 &quot;version&quot;: &quot;A String&quot;, # Optional. The SDK version.
784 },
785 },
786 &quot;status&quot;: { # The `Status` type defines a logical error model that is suitable for # The status of the get template request. Any problems with the
787 # request will be indicated in the error_details.
788 # different programming environments, including REST APIs and RPC APIs. It is
789 # used by [gRPC](https://github.com/grpc). Each `Status` message contains
790 # three pieces of data: error code, error message, and error details.
791 #
792 # You can find out more about this error model and how to work with it in the
793 # [API Design Guide](https://cloud.google.com/apis/design/errors).
794 &quot;details&quot;: [ # A list of messages that carry the error details. There is a common set of
795 # message types for APIs to use.
796 {
797 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
798 },
799 ],
800 &quot;code&quot;: 42, # The status code, which should be an enum value of google.rpc.Code.
801 &quot;message&quot;: &quot;A String&quot;, # A developer-facing error message, which should be in English. Any
802 # user-facing error message should be localized and sent in the
803 # google.rpc.Status.details field, or localized by the client.
804 },
805 &quot;metadata&quot;: { # Metadata describing a template. # The template metadata describing the template name, available
806 # parameters, etc.
807 &quot;description&quot;: &quot;A String&quot;, # Optional. A description of the template.
808 &quot;parameters&quot;: [ # The parameters for the template.
809 { # Metadata for a specific parameter.
810 &quot;label&quot;: &quot;A String&quot;, # Required. The label to display for the parameter.
811 &quot;helpText&quot;: &quot;A String&quot;, # Required. The help text to display for the parameter.
812 &quot;regexes&quot;: [ # Optional. Regexes that the parameter must match.
813 &quot;A String&quot;,
814 ],
815 &quot;paramType&quot;: &quot;A String&quot;, # Optional. The type of the parameter.
816 # Used for selecting input picker.
817 &quot;isOptional&quot;: True or False, # Optional. Whether the parameter is optional. Defaults to false.
818 &quot;name&quot;: &quot;A String&quot;, # Required. The name of the parameter.
819 },
820 ],
821 &quot;name&quot;: &quot;A String&quot;, # Required. The name of the template.
822 },
823 &quot;templateType&quot;: &quot;A String&quot;, # Template Type.
824 }</pre>
825</div>
826
827<div class="method">
828 <code class="details" id="launch">launch(projectId, location, body=None, dynamicTemplate_gcsPath=None, dynamicTemplate_stagingLocation=None, validateOnly=None, gcsPath=None, x__xgafv=None)</code>
829 <pre>Launch a template.
830
831Args:
832 projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)
833 location: string, The [regional endpoint]
834(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
835which to direct the request. (required)
836 body: object, The request body.
837 The object takes the form of:
838
839{ # Parameters to provide to the template being launched.
840 &quot;environment&quot;: { # The environment values to set at runtime. # The runtime environment for the job.
841 &quot;bypassTempDirValidation&quot;: True or False, # Whether to bypass the safety checks for the job&#x27;s temporary directory.
842 # Use with caution.
843 &quot;tempLocation&quot;: &quot;A String&quot;, # The Cloud Storage path to use for temporary files.
844 # Must be a valid Cloud Storage URL, beginning with `gs://`.
845 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
846 # the service will use the network &quot;default&quot;.
847 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
848 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
849 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
850 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
851 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
852 # with worker_zone. If neither worker_region nor worker_zone is specified,
853 # default to the control plane&#x27;s region.
854 &quot;numWorkers&quot;: 42, # The initial number of Google Compute Engine instnaces for the job.
855 &quot;additionalExperiments&quot;: [ # Additional experiment flags for the job.
856 &quot;A String&quot;,
857 ],
858 &quot;zone&quot;: &quot;A String&quot;, # The Compute Engine [availability
859 # zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)
860 # for launching worker instances to run your pipeline.
861 # In the future, worker_zone will take precedence.
862 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # The email address of the service account to run the job as.
863 &quot;maxWorkers&quot;: 42, # The maximum number of Google Compute Engine instances to be made
864 # available to your pipeline during execution, from 1 to 1000.
865 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
866 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
867 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
868 # with worker_region. If neither worker_region nor worker_zone is specified,
869 # a zone in the control plane&#x27;s region is chosen based on available capacity.
870 # If both `worker_zone` and `zone` are set, `worker_zone` takes precedence.
871 &quot;additionalUserLabels&quot;: { # Additional user labels to be specified for the job.
872 # Keys and values should follow the restrictions specified in the [labeling
873 # restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)
874 # page.
Bu Sun Kim65020912020-05-20 12:08:20 -0700875 &quot;a_key&quot;: &quot;A String&quot;,
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400876 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700877 &quot;machineType&quot;: &quot;A String&quot;, # The machine type to use for the job. Defaults to the value from the
878 # template if not specified.
879 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
880 &quot;kmsKeyName&quot;: &quot;A String&quot;, # Optional. Name for the Cloud KMS key for the job.
881 # Key format is:
882 # projects/&lt;project&gt;/locations/&lt;location&gt;/keyRings/&lt;keyring&gt;/cryptoKeys/&lt;key&gt;
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400883 },
Bu Sun Kimd059ad82020-07-22 17:02:09 -0700884 &quot;transformNameMapping&quot;: { # Only applicable when updating a pipeline. Map of transform name prefixes of
885 # the job to be replaced to the corresponding name prefixes of the new job.
886 &quot;a_key&quot;: &quot;A String&quot;,
887 },
888 &quot;update&quot;: True or False, # If set, replace the existing pipeline with the name specified by jobName
889 # with this pipeline, preserving state.
890 &quot;jobName&quot;: &quot;A String&quot;, # Required. The job name to use for the created job.
891 &quot;parameters&quot;: { # The runtime parameters to pass to the job.
892 &quot;a_key&quot;: &quot;A String&quot;,
893 },
894 }
895
896 dynamicTemplate_gcsPath: string, Path to dynamic template spec file on GCS.
897The file must be a Json serialized DynamicTemplateFieSpec object.
898 dynamicTemplate_stagingLocation: string, Cloud Storage path for staging dependencies.
899Must be a valid Cloud Storage URL, beginning with `gs://`.
900 validateOnly: boolean, If true, the request is validated but not actually executed.
901Defaults to false.
902 gcsPath: string, A Cloud Storage path to the template from which to create
903the job.
904Must be valid Cloud Storage URL, beginning with &#x27;gs://&#x27;.
905 x__xgafv: string, V1 error format.
906 Allowed values
907 1 - v1 error format
908 2 - v2 error format
909
910Returns:
911 An object of the form:
912
913 { # Response to the request to launch a template.
914 &quot;job&quot;: { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and
915 # the job was successfully launched.
916 &quot;pipelineDescription&quot;: { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
917 # A description of the user pipeline and stages through which it is executed.
918 # Created by Cloud Dataflow service. Only retrieved with
919 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
920 # form. This data is provided by the Dataflow service for ease of visualizing
921 # the pipeline and interpreting Dataflow provided metrics.
922 &quot;displayData&quot;: [ # Pipeline level display data.
923 { # Data provided with a pipeline or transform to provide descriptive info.
924 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
925 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
926 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
927 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
928 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
929 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
930 # This is intended to be used as a label for the display data
931 # when viewed in a dax monitoring system.
932 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
933 # language namespace (i.e. python module) which defines the display data.
934 # This allows a dax monitoring system to specially handle the data
935 # and perform custom rendering.
936 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
937 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
938 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
939 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
940 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
941 # For example a java_class_name_value of com.mypackage.MyDoFn
942 # will be stored with MyDoFn as the short_str_value and
943 # com.mypackage.MyDoFn as the java_class_name value.
944 # short_str_value can be displayed and java_class_name_value
945 # will be displayed as a tooltip.
946 },
947 ],
948 &quot;originalPipelineTransform&quot;: [ # Description of each transform in the pipeline and collections between them.
949 { # Description of the type, names/ids, and input/outputs for a transform.
950 &quot;outputCollectionName&quot;: [ # User names for all collection outputs to this transform.
951 &quot;A String&quot;,
952 ],
953 &quot;displayData&quot;: [ # Transform-specific display data.
954 { # Data provided with a pipeline or transform to provide descriptive info.
955 &quot;url&quot;: &quot;A String&quot;, # An optional full URL.
956 &quot;javaClassValue&quot;: &quot;A String&quot;, # Contains value if the data is of java class type.
957 &quot;timestampValue&quot;: &quot;A String&quot;, # Contains value if the data is of timestamp type.
958 &quot;durationValue&quot;: &quot;A String&quot;, # Contains value if the data is of duration type.
959 &quot;label&quot;: &quot;A String&quot;, # An optional label to display in a dax UI for the element.
960 &quot;key&quot;: &quot;A String&quot;, # The key identifying the display data.
961 # This is intended to be used as a label for the display data
962 # when viewed in a dax monitoring system.
963 &quot;namespace&quot;: &quot;A String&quot;, # The namespace for the key. This is usually a class name or programming
964 # language namespace (i.e. python module) which defines the display data.
965 # This allows a dax monitoring system to specially handle the data
966 # and perform custom rendering.
967 &quot;floatValue&quot;: 3.14, # Contains value if the data is of float type.
968 &quot;strValue&quot;: &quot;A String&quot;, # Contains value if the data is of string type.
969 &quot;int64Value&quot;: &quot;A String&quot;, # Contains value if the data is of int64 type.
970 &quot;boolValue&quot;: True or False, # Contains value if the data is of a boolean type.
971 &quot;shortStrValue&quot;: &quot;A String&quot;, # A possible additional shorter value to display.
972 # For example a java_class_name_value of com.mypackage.MyDoFn
973 # will be stored with MyDoFn as the short_str_value and
974 # com.mypackage.MyDoFn as the java_class_name value.
975 # short_str_value can be displayed and java_class_name_value
976 # will be displayed as a tooltip.
977 },
978 ],
979 &quot;id&quot;: &quot;A String&quot;, # SDK generated id of this transform instance.
980 &quot;inputCollectionName&quot;: [ # User names for all collection inputs to this transform.
981 &quot;A String&quot;,
982 ],
983 &quot;name&quot;: &quot;A String&quot;, # User provided name for this transform instance.
984 &quot;kind&quot;: &quot;A String&quot;, # Type of transform.
985 },
986 ],
987 &quot;executionPipelineStage&quot;: [ # Description of each stage of execution of the pipeline.
988 { # Description of the composing transforms, names/ids, and input/outputs of a
989 # stage of execution. Some composing transforms and sources may have been
990 # generated by the Dataflow service during execution planning.
991 &quot;componentSource&quot;: [ # Collections produced and consumed by component transforms of this stage.
992 { # Description of an interstitial value between transforms in an execution
993 # stage.
994 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
995 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
996 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
997 # source is most closely associated.
998 },
999 ],
1000 &quot;inputSource&quot;: [ # Input sources for this stage.
1001 { # Description of an input or output of an execution stage.
1002 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1003 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1004 # source is most closely associated.
1005 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1006 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1007 },
1008 ],
1009 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this stage.
1010 &quot;componentTransform&quot;: [ # Transforms that comprise this execution stage.
1011 { # Description of a transform executed as part of an execution stage.
1012 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1013 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this transform; may be user or system generated.
1014 &quot;originalTransform&quot;: &quot;A String&quot;, # User name for the original user transform with which this transform is
1015 # most closely associated.
1016 },
1017 ],
1018 &quot;id&quot;: &quot;A String&quot;, # Dataflow service generated id for this stage.
1019 &quot;outputSource&quot;: [ # Output sources for this stage.
1020 { # Description of an input or output of an execution stage.
1021 &quot;userName&quot;: &quot;A String&quot;, # Human-readable name for this source; may be user or system generated.
1022 &quot;originalTransformOrCollection&quot;: &quot;A String&quot;, # User name for the original user transform or collection with which this
1023 # source is most closely associated.
1024 &quot;sizeBytes&quot;: &quot;A String&quot;, # Size of the source, if measurable.
1025 &quot;name&quot;: &quot;A String&quot;, # Dataflow service generated name for this source.
1026 },
1027 ],
1028 &quot;kind&quot;: &quot;A String&quot;, # Type of tranform this stage is executing.
1029 },
1030 ],
1031 },
1032 &quot;labels&quot;: { # User-defined labels for this job.
1033 #
1034 # The labels map can contain no more than 64 entries. Entries of the labels
1035 # map are UTF8 strings that comply with the following restrictions:
1036 #
1037 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1038 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1039 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
1040 # size.
1041 &quot;a_key&quot;: &quot;A String&quot;,
1042 },
1043 &quot;projectId&quot;: &quot;A String&quot;, # The ID of the Cloud Platform project that the job belongs to.
1044 &quot;environment&quot;: { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
1045 &quot;flexResourceSchedulingGoal&quot;: &quot;A String&quot;, # Which Flexible Resource Scheduling mode to run in.
1046 &quot;workerRegion&quot;: &quot;A String&quot;, # The Compute Engine region
1047 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1048 # which worker processing should occur, e.g. &quot;us-west1&quot;. Mutually exclusive
1049 # with worker_zone. If neither worker_region nor worker_zone is specified,
1050 # default to the control plane&#x27;s region.
1051 &quot;userAgent&quot;: { # A description of the process that generated the request.
1052 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1053 },
1054 &quot;serviceAccountEmail&quot;: &quot;A String&quot;, # Identity to run virtual machines as. Defaults to the default account.
1055 &quot;version&quot;: { # A structure describing which components and their versions of the service
1056 # are required in order to run the job.
1057 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1058 },
1059 &quot;serviceKmsKeyName&quot;: &quot;A String&quot;, # If set, contains the Cloud KMS key identifier used to encrypt data
1060 # at rest, AKA a Customer Managed Encryption Key (CMEK).
1061 #
1062 # Format:
1063 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
1064 &quot;experiments&quot;: [ # The list of experiments to enable.
1065 &quot;A String&quot;,
1066 ],
1067 &quot;workerZone&quot;: &quot;A String&quot;, # The Compute Engine zone
1068 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1069 # which worker processing should occur, e.g. &quot;us-west1-a&quot;. Mutually exclusive
1070 # with worker_region. If neither worker_region nor worker_zone is specified,
1071 # a zone in the control plane&#x27;s region is chosen based on available capacity.
1072 &quot;workerPools&quot;: [ # The worker pools. At least one &quot;harness&quot; worker pool must be
1073 # specified in order for the job to have workers.
1074 { # Describes one particular pool of Cloud Dataflow workers to be
1075 # instantiated by the Cloud Dataflow service in order to perform the
1076 # computations required by a job. Note that a workflow job may use
1077 # multiple pools, in order to match the various computational
1078 # requirements of the various stages of the job.
1079 &quot;onHostMaintenance&quot;: &quot;A String&quot;, # The action to take on host maintenance, as defined by the Google
1080 # Compute Engine API.
1081 &quot;sdkHarnessContainerImages&quot;: [ # Set of SDK harness containers needed to execute this pipeline. This will
1082 # only be set in the Fn API path. For non-cross-language pipelines this
1083 # should have only one entry. Cross-language pipelines will have two or more
1084 # entries.
1085 { # Defines a SDK harness container for executing Dataflow pipelines.
1086 &quot;containerImage&quot;: &quot;A String&quot;, # A docker container image that resides in Google Container Registry.
1087 &quot;useSingleCorePerContainer&quot;: True or False, # If true, recommends the Dataflow service to use only one core per SDK
1088 # container instance with this image. If false (or unset) recommends using
1089 # more than one core per SDK container instance with this image for
1090 # efficiency. Note that Dataflow service may choose to override this property
1091 # if needed.
1092 },
1093 ],
1094 &quot;zone&quot;: &quot;A String&quot;, # Zone to run the worker pools in. If empty or unspecified, the service
1095 # will attempt to choose a reasonable default.
1096 &quot;kind&quot;: &quot;A String&quot;, # The kind of the worker pool; currently only `harness` and `shuffle`
1097 # are supported.
1098 &quot;metadata&quot;: { # Metadata to set on the Google Compute Engine VMs.
1099 &quot;a_key&quot;: &quot;A String&quot;,
1100 },
1101 &quot;diskSourceImage&quot;: &quot;A String&quot;, # Fully qualified source image for disks.
1102 &quot;dataDisks&quot;: [ # Data disks that are used by a VM in this workflow.
1103 { # Describes the data disk used by a workflow job.
1104 &quot;sizeGb&quot;: 42, # Size of disk in GB. If zero or unspecified, the service will
1105 # attempt to choose a reasonable default.
1106 &quot;diskType&quot;: &quot;A String&quot;, # Disk storage type, as defined by Google Compute Engine. This
1107 # must be a disk type appropriate to the project and zone in which
1108 # the workers will run. If unknown or unspecified, the service
1109 # will attempt to choose a reasonable default.
1110 #
1111 # For example, the standard persistent disk type is a resource name
1112 # typically ending in &quot;pd-standard&quot;. If SSD persistent disks are
1113 # available, the resource name typically ends with &quot;pd-ssd&quot;. The
1114 # actual valid values are defined the Google Compute Engine API,
1115 # not by the Cloud Dataflow API; consult the Google Compute Engine
1116 # documentation for more information about determining the set of
1117 # available disk types for a particular project and zone.
1118 #
1119 # Google Compute Engine Disk types are local to a particular
1120 # project in a particular zone, and so the resource name will
1121 # typically look something like this:
1122 #
1123 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
1124 &quot;mountPoint&quot;: &quot;A String&quot;, # Directory in a VM where disk is mounted.
1125 },
1126 ],
1127 &quot;packages&quot;: [ # Packages to be installed on workers.
1128 { # The packages that must be installed in order for a worker to run the
1129 # steps of the Cloud Dataflow job that will be assigned to its worker
1130 # pool.
1131 #
1132 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1133 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1134 # might use this to install jars containing the user&#x27;s code and all of the
1135 # various dependencies (libraries, data files, etc.) required in order
1136 # for that code to run.
1137 &quot;name&quot;: &quot;A String&quot;, # The name of the package.
1138 &quot;location&quot;: &quot;A String&quot;, # The resource to read the package from. The supported resource type is:
1139 #
1140 # Google Cloud Storage:
1141 #
1142 # storage.googleapis.com/{bucket}
1143 # bucket.storage.googleapis.com/
1144 },
1145 ],
1146 &quot;teardownPolicy&quot;: &quot;A String&quot;, # Sets the policy for determining when to turndown worker pool.
1147 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1148 # `TEARDOWN_NEVER`.
1149 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1150 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1151 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1152 # down.
1153 #
1154 # If the workers are not torn down by the service, they will
1155 # continue to run and use Google Compute Engine VM resources in the
1156 # user&#x27;s project until they are explicitly terminated by the user.
1157 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1158 # policy except for small, manually supervised test jobs.
1159 #
1160 # If unknown or unspecified, the service will attempt to choose a reasonable
1161 # default.
1162 &quot;network&quot;: &quot;A String&quot;, # Network to which VMs will be assigned. If empty or unspecified,
1163 # the service will use the network &quot;default&quot;.
1164 &quot;ipConfiguration&quot;: &quot;A String&quot;, # Configuration for VM IPs.
1165 &quot;diskSizeGb&quot;: 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1166 # attempt to choose a reasonable default.
1167 &quot;autoscalingSettings&quot;: { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1168 &quot;maxNumWorkers&quot;: 42, # The maximum number of workers to cap scaling at.
1169 &quot;algorithm&quot;: &quot;A String&quot;, # The algorithm to use for autoscaling.
1170 },
1171 &quot;poolArgs&quot;: { # Extra arguments for this worker pool.
1172 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1173 },
1174 &quot;subnetwork&quot;: &quot;A String&quot;, # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1175 # the form &quot;regions/REGION/subnetworks/SUBNETWORK&quot;.
1176 &quot;numWorkers&quot;: 42, # Number of Google Compute Engine workers in this pool needed to
1177 # execute the job. If zero or unspecified, the service will
1178 # attempt to choose a reasonable default.
1179 &quot;numThreadsPerWorker&quot;: 42, # The number of threads per worker harness. If empty or unspecified, the
1180 # service will choose a number of threads (according to the number of cores
1181 # on the selected machine type for batch, or 1 by convention for streaming).
1182 &quot;workerHarnessContainerImage&quot;: &quot;A String&quot;, # Required. Docker container image that executes the Cloud Dataflow worker
1183 # harness, residing in Google Container Registry.
1184 #
1185 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1186 &quot;taskrunnerSettings&quot;: { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1187 # using the standard Dataflow task runner. Users should ignore
1188 # this field.
1189 &quot;dataflowApiVersion&quot;: &quot;A String&quot;, # The API version of endpoint, e.g. &quot;v1b3&quot;
1190 &quot;oauthScopes&quot;: [ # The OAuth2 scopes to be requested by the taskrunner in order to
1191 # access the Cloud Dataflow API.
1192 &quot;A String&quot;,
1193 ],
1194 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1195 #
1196 # When workers access Google Cloud APIs, they logically do so via
1197 # relative URLs. If this field is specified, it supplies the base
1198 # URL to use for resolving these relative URLs. The normative
1199 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1200 # Locators&quot;.
1201 #
1202 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
1203 &quot;workflowFileName&quot;: &quot;A String&quot;, # The file to store the workflow in.
1204 &quot;logToSerialconsole&quot;: True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1205 # console.
1206 &quot;baseTaskDir&quot;: &quot;A String&quot;, # The location on the worker for task-specific subdirectories.
1207 &quot;taskUser&quot;: &quot;A String&quot;, # The UNIX user ID on the worker VM to use for tasks launched by
1208 # taskrunner; e.g. &quot;root&quot;.
1209 &quot;vmId&quot;: &quot;A String&quot;, # The ID string of the VM.
1210 &quot;alsologtostderr&quot;: True or False, # Whether to also send taskrunner log info to stderr.
1211 &quot;parallelWorkerSettings&quot;: { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1212 &quot;shuffleServicePath&quot;: &quot;A String&quot;, # The Shuffle service path relative to the root URL, for example,
1213 # &quot;shuffle/v1beta1&quot;.
1214 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1215 # storage.
1216 #
1217 # The supported resource type is:
1218 #
1219 # Google Cloud Storage:
1220 #
1221 # storage.googleapis.com/{bucket}/{object}
1222 # bucket.storage.googleapis.com/{object}
1223 &quot;reportingEnabled&quot;: True or False, # Whether to send work progress updates to the service.
1224 &quot;servicePath&quot;: &quot;A String&quot;, # The Cloud Dataflow service path relative to the root URL, for example,
1225 # &quot;dataflow/v1b3/projects&quot;.
1226 &quot;baseUrl&quot;: &quot;A String&quot;, # The base URL for accessing Google Cloud APIs.
1227 #
1228 # When workers access Google Cloud APIs, they logically do so via
1229 # relative URLs. If this field is specified, it supplies the base
1230 # URL to use for resolving these relative URLs. The normative
1231 # algorithm used is defined by RFC 1808, &quot;Relative Uniform Resource
1232 # Locators&quot;.
1233 #
1234 # If not specified, the default value is &quot;http://www.googleapis.com/&quot;
1235 &quot;workerId&quot;: &quot;A String&quot;, # The ID of the worker running this pipeline.
1236 },
1237 &quot;harnessCommand&quot;: &quot;A String&quot;, # The command to launch the worker harness.
1238 &quot;logDir&quot;: &quot;A String&quot;, # The directory on the VM to store logs.
1239 &quot;streamingWorkerMainClass&quot;: &quot;A String&quot;, # The streaming worker main class name.
1240 &quot;languageHint&quot;: &quot;A String&quot;, # The suggested backend language.
1241 &quot;taskGroup&quot;: &quot;A String&quot;, # The UNIX group ID on the worker VM to use for tasks launched by
1242 # taskrunner; e.g. &quot;wheel&quot;.
1243 &quot;logUploadLocation&quot;: &quot;A String&quot;, # Indicates where to put logs. If this is not specified, the logs
1244 # will not be uploaded.
1245 #
1246 # The supported resource type is:
1247 #
1248 # Google Cloud Storage:
1249 # storage.googleapis.com/{bucket}/{object}
1250 # bucket.storage.googleapis.com/{object}
1251 &quot;commandlinesFileName&quot;: &quot;A String&quot;, # The file to store preprocessing commands in.
1252 &quot;continueOnException&quot;: True or False, # Whether to continue taskrunner if an exception is hit.
1253 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the taskrunner should use for
1254 # temporary storage.
1255 #
1256 # The supported resource type is:
1257 #
1258 # Google Cloud Storage:
1259 # storage.googleapis.com/{bucket}/{object}
1260 # bucket.storage.googleapis.com/{object}
1261 },
1262 &quot;diskType&quot;: &quot;A String&quot;, # Type of root disk for VMs. If empty or unspecified, the service will
1263 # attempt to choose a reasonable default.
1264 &quot;defaultPackageSet&quot;: &quot;A String&quot;, # The default package set to install. This allows the service to
1265 # select a default set of packages which are useful to worker
1266 # harnesses written in a particular language.
1267 &quot;machineType&quot;: &quot;A String&quot;, # Machine type (e.g. &quot;n1-standard-1&quot;). If empty or unspecified, the
1268 # service will attempt to choose a reasonable default.
1269 },
1270 ],
1271 &quot;tempStoragePrefix&quot;: &quot;A String&quot;, # The prefix of the resources the system should use for temporary
1272 # storage. The system will append the suffix &quot;/temp-{JOBNAME} to
1273 # this resource prefix, where {JOBNAME} is the value of the
1274 # job_name field. The resulting bucket and object prefix is used
1275 # as the prefix of the resources used to store temporary data
1276 # needed during the job execution. NOTE: This will override the
1277 # value in taskrunner_settings.
1278 # The supported resource type is:
1279 #
1280 # Google Cloud Storage:
1281 #
1282 # storage.googleapis.com/{bucket}/{object}
1283 # bucket.storage.googleapis.com/{object}
1284 &quot;internalExperiments&quot;: { # Experimental settings.
1285 &quot;a_key&quot;: &quot;&quot;, # Properties of the object. Contains field @type with type URL.
1286 },
1287 &quot;sdkPipelineOptions&quot;: { # The Cloud Dataflow SDK pipeline options specified by the user. These
1288 # options are passed through the service and are used to recreate the
1289 # SDK pipeline options on the worker in a language agnostic and platform
1290 # independent way.
1291 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1292 },
1293 &quot;dataset&quot;: &quot;A String&quot;, # The dataset for the current project where various workflow
1294 # related tables are stored.
1295 #
1296 # The supported resource type is:
1297 #
1298 # Google BigQuery:
1299 # bigquery.googleapis.com/{dataset}
1300 &quot;clusterManagerApiService&quot;: &quot;A String&quot;, # The type of cluster manager API to use. If unknown or
1301 # unspecified, the service will attempt to choose a reasonable
1302 # default. This should be in the form of the API service name,
1303 # e.g. &quot;compute.googleapis.com&quot;.
1304 },
1305 &quot;stepsLocation&quot;: &quot;A String&quot;, # The GCS location where the steps are stored.
1306 &quot;steps&quot;: [ # Exactly one of step or steps_location should be specified.
1307 #
1308 # The top-level steps that constitute the entire job.
1309 { # Defines a particular step within a Cloud Dataflow job.
1310 #
1311 # A job consists of multiple steps, each of which performs some
1312 # specific operation as part of the overall job. Data is typically
1313 # passed from one step to another as part of the job.
1314 #
1315 # Here&#x27;s an example of a sequence of steps which together implement a
1316 # Map-Reduce job:
1317 #
1318 # * Read a collection of data from some source, parsing the
1319 # collection&#x27;s elements.
1320 #
1321 # * Validate the elements.
1322 #
1323 # * Apply a user-defined function to map each element to some value
1324 # and extract an element-specific key value.
1325 #
1326 # * Group elements with the same key into a single element with
1327 # that key, transforming a multiply-keyed collection into a
1328 # uniquely-keyed collection.
1329 #
1330 # * Write the elements out to some data sink.
1331 #
1332 # Note that the Cloud Dataflow service may be used to run many different
1333 # types of jobs, not just Map-Reduce.
1334 &quot;kind&quot;: &quot;A String&quot;, # The kind of step in the Cloud Dataflow job.
1335 &quot;properties&quot;: { # Named properties associated with the step. Each kind of
1336 # predefined step has its own required set of properties.
1337 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1338 &quot;a_key&quot;: &quot;&quot;, # Properties of the object.
1339 },
1340 &quot;name&quot;: &quot;A String&quot;, # The name that identifies the step. This must be unique for each
1341 # step with respect to all other steps in the Cloud Dataflow job.
1342 },
1343 ],
1344 &quot;stageStates&quot;: [ # This field may be mutated by the Cloud Dataflow service;
1345 # callers cannot mutate it.
1346 { # A message describing the state of a particular execution stage.
1347 &quot;executionStageState&quot;: &quot;A String&quot;, # Executions stage states allow the same set of values as JobState.
1348 &quot;executionStageName&quot;: &quot;A String&quot;, # The name of the execution stage.
1349 &quot;currentStateTime&quot;: &quot;A String&quot;, # The time at which the stage transitioned to this state.
1350 },
1351 ],
1352 &quot;replacedByJobId&quot;: &quot;A String&quot;, # If another job is an update of this job (and thus, this job is in
1353 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1354 &quot;jobMetadata&quot;: { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1355 # by the metadata values provided here. Populated for ListJobs and all GetJob
1356 # views SUMMARY and higher.
1357 # ListJob response and Job SUMMARY view.
1358 &quot;sdkVersion&quot;: { # The version of the SDK used to run the job. # The SDK version used to run the job.
1359 &quot;sdkSupportStatus&quot;: &quot;A String&quot;, # The support status for this SDK version.
1360 &quot;versionDisplayName&quot;: &quot;A String&quot;, # A readable string describing the version of the SDK.
1361 &quot;version&quot;: &quot;A String&quot;, # The version of the SDK used to run the job.
1362 },
1363 &quot;bigTableDetails&quot;: [ # Identification of a BigTable source used in the Dataflow job.
1364 { # Metadata for a BigTable connector used by the job.
1365 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1366 &quot;tableId&quot;: &quot;A String&quot;, # TableId accessed in the connection.
1367 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1368 },
1369 ],
1370 &quot;pubsubDetails&quot;: [ # Identification of a PubSub source used in the Dataflow job.
1371 { # Metadata for a PubSub connector used by the job.
1372 &quot;subscription&quot;: &quot;A String&quot;, # Subscription used in the connection.
1373 &quot;topic&quot;: &quot;A String&quot;, # Topic accessed in the connection.
1374 },
1375 ],
1376 &quot;bigqueryDetails&quot;: [ # Identification of a BigQuery source used in the Dataflow job.
1377 { # Metadata for a BigQuery connector used by the job.
1378 &quot;dataset&quot;: &quot;A String&quot;, # Dataset accessed in the connection.
1379 &quot;projectId&quot;: &quot;A String&quot;, # Project accessed in the connection.
1380 &quot;query&quot;: &quot;A String&quot;, # Query used to access data in the connection.
1381 &quot;table&quot;: &quot;A String&quot;, # Table accessed in the connection.
1382 },
1383 ],
1384 &quot;fileDetails&quot;: [ # Identification of a File source used in the Dataflow job.
1385 { # Metadata for a File connector used by the job.
1386 &quot;filePattern&quot;: &quot;A String&quot;, # File Pattern used to access files by the connector.
1387 },
1388 ],
1389 &quot;datastoreDetails&quot;: [ # Identification of a Datastore source used in the Dataflow job.
1390 { # Metadata for a Datastore connector used by the job.
1391 &quot;namespace&quot;: &quot;A String&quot;, # Namespace used in the connection.
1392 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1393 },
1394 ],
1395 &quot;spannerDetails&quot;: [ # Identification of a Spanner source used in the Dataflow job.
1396 { # Metadata for a Spanner connector used by the job.
1397 &quot;instanceId&quot;: &quot;A String&quot;, # InstanceId accessed in the connection.
1398 &quot;databaseId&quot;: &quot;A String&quot;, # DatabaseId accessed in the connection.
1399 &quot;projectId&quot;: &quot;A String&quot;, # ProjectId accessed in the connection.
1400 },
1401 ],
1402 },
1403 &quot;location&quot;: &quot;A String&quot;, # The [regional endpoint]
1404 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1405 # contains this job.
1406 &quot;transformNameMapping&quot;: { # The map of transform name prefixes of the job to be replaced to the
1407 # corresponding name prefixes of the new job.
1408 &quot;a_key&quot;: &quot;A String&quot;,
1409 },
1410 &quot;startTime&quot;: &quot;A String&quot;, # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1411 # Flexible resource scheduling jobs are started with some delay after job
1412 # creation, so start_time is unset before start and is updated when the
1413 # job is started by the Cloud Dataflow service. For other jobs, start_time
1414 # always equals to create_time and is immutable and set by the Cloud Dataflow
1415 # service.
1416 &quot;clientRequestId&quot;: &quot;A String&quot;, # The client&#x27;s unique identifier of the job, re-used across retried attempts.
1417 # If this field is set, the service will ensure its uniqueness.
1418 # The request to create a job will fail if the service has knowledge of a
1419 # previously submitted job with the same client&#x27;s ID and job name.
1420 # The caller may use this field to ensure idempotence of job
1421 # creation across retried attempts to create a job.
1422 # By default, the field is empty and, in that case, the service ignores it.
1423 &quot;executionInfo&quot;: { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1424 # isn&#x27;t contained in the submitted job.
1425 &quot;stages&quot;: { # A mapping from each stage to the information about that stage.
1426 &quot;a_key&quot;: { # Contains information about how a particular
1427 # google.dataflow.v1beta3.Step will be executed.
1428 &quot;stepName&quot;: [ # The steps associated with the execution stage.
1429 # Note that stages may have several steps, and that a given step
1430 # might be run by more than one stage.
1431 &quot;A String&quot;,
1432 ],
1433 },
1434 },
1435 },
1436 &quot;type&quot;: &quot;A String&quot;, # The type of Cloud Dataflow job.
1437 &quot;createTime&quot;: &quot;A String&quot;, # The timestamp when the job was initially created. Immutable and set by the
1438 # Cloud Dataflow service.
1439 &quot;tempFiles&quot;: [ # A set of files the system should be aware of that are used
1440 # for temporary storage. These temporary files will be
1441 # removed on job completion.
1442 # No duplicates are allowed.
1443 # No file patterns are supported.
1444 #
1445 # The supported files are:
1446 #
1447 # Google Cloud Storage:
1448 #
1449 # storage.googleapis.com/{bucket}/{object}
1450 # bucket.storage.googleapis.com/{object}
1451 &quot;A String&quot;,
1452 ],
1453 &quot;id&quot;: &quot;A String&quot;, # The unique ID of this job.
1454 #
1455 # This field is set by the Cloud Dataflow service when the Job is
1456 # created, and is immutable for the life of the job.
1457 &quot;requestedState&quot;: &quot;A String&quot;, # The job&#x27;s requested state.
1458 #
1459 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1460 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1461 # also be used to directly set a job&#x27;s requested state to
1462 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1463 # job if it has not already reached a terminal state.
1464 &quot;replaceJobId&quot;: &quot;A String&quot;, # If this job is an update of an existing job, this field is the job ID
1465 # of the job it replaced.
1466 #
1467 # When sending a `CreateJobRequest`, you can update a job by specifying it
1468 # here. The job named here is stopped, and its intermediate state is
1469 # transferred to this job.
1470 &quot;createdFromSnapshotId&quot;: &quot;A String&quot;, # If this is specified, the job&#x27;s initial state is populated from the given
1471 # snapshot.
1472 &quot;currentState&quot;: &quot;A String&quot;, # The current state of the job.
1473 #
1474 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1475 # specified.
1476 #
1477 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1478 # terminal state. After a job has reached a terminal state, no
1479 # further state updates may be made.
1480 #
1481 # This field may be mutated by the Cloud Dataflow service;
1482 # callers cannot mutate it.
1483 &quot;name&quot;: &quot;A String&quot;, # The user-specified Cloud Dataflow job name.
1484 #
1485 # Only one Job with a given name may exist in a project at any
1486 # given time. If a caller attempts to create a Job with the same
1487 # name as an already-existing Job, the attempt returns the
1488 # existing Job.
1489 #
1490 # The name must match the regular expression
1491 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1492 &quot;currentStateTime&quot;: &quot;A String&quot;, # The timestamp associated with the current state.
1493 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001494 }</pre>
1495</div>
1496
1497</body></html>