blob: 52c745df7f34ae559b5d038e2a09203d0437a44e [file] [log] [blame]
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.jobs.html">jobs</a></h1>
Nathaniel Manista4f877e52015-06-15 16:44:50 +000076<h2>Instance Methods</h2>
77<p class="toc_element">
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -070078 <code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>
79</p>
80<p class="firstline">Returns the debug Resource.</p>
81
82<p class="toc_element">
Nathaniel Manista4f877e52015-06-15 16:44:50 +000083 <code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>
84</p>
85<p class="firstline">Returns the messages Resource.</p>
86
87<p class="toc_element">
88 <code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>
89</p>
90<p class="firstline">Returns the workItems Resource.</p>
91
92<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070093 <code><a href="#aggregated">aggregated(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</a></code></p>
94<p class="firstline">List the jobs of a project across all regions.</p>
95<p class="toc_element">
96 <code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>
97<p class="firstline">Retrieves the next page of results.</p>
98<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -070099 <code><a href="#create">create(projectId, body=None, location=None, x__xgafv=None, replaceJobId=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400100<p class="firstline">Creates a Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000101<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800102 <code><a href="#get">get(projectId, jobId, location=None, x__xgafv=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400103<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000104<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800105 <code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000106<p class="firstline">Request the job status.</p>
107<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700108 <code><a href="#list">list(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400109<p class="firstline">List the jobs of a project.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000110<p class="toc_element">
111 <code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
112<p class="firstline">Retrieves the next page of results.</p>
113<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700114 <code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700115<p class="firstline">Snapshot the state of a streaming job.</p>
116<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700117 <code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400118<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000119<h3>Method Details</h3>
120<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700121 <code class="details" id="aggregated">aggregated(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</code>
122 <pre>List the jobs of a project across all regions.
123
124Args:
125 projectId: string, The project which owns the jobs. (required)
126 pageSize: integer, If there are many jobs, limit response to at most this many.
127The actual number of jobs returned will be the lesser of max_responses
128and an unspecified server-defined limit.
129 pageToken: string, Set this to the 'next_page_token' field of a previous response
130to request additional results in a long list.
131 x__xgafv: string, V1 error format.
132 Allowed values
133 1 - v1 error format
134 2 - v2 error format
135 location: string, The [regional endpoint]
136(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
137contains this job.
138 filter: string, The kind of filter to use.
139 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
140
141Returns:
142 An object of the form:
143
Dan O'Mearadd494642020-05-01 07:42:23 -0700144 { # Response to a request to list Cloud Dataflow jobs in a project. This might
145 # be a partial response, depending on the page size in the ListJobsRequest.
146 # However, if the project does not have any jobs, an instance of
147 # ListJobsResponse is not returned and the requests's response
148 # body is empty {}.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700149 "nextPageToken": "A String", # Set if there may be more results than fit in this response.
150 "failedLocation": [ # Zero or more messages describing the [regional endpoints]
151 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
152 # failed to respond.
153 { # Indicates which [regional endpoint]
154 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
155 # to respond to a request for data.
156 "name": "A String", # The name of the [regional endpoint]
157 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
158 # failed to respond.
159 },
160 ],
161 "jobs": [ # A subset of the requested job information.
162 { # Defines a job to be run by the Cloud Dataflow service.
163 "labels": { # User-defined labels for this job.
164 #
165 # The labels map can contain no more than 64 entries. Entries of the labels
166 # map are UTF8 strings that comply with the following restrictions:
167 #
168 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
169 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -0700170 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700171 # size.
172 "a_key": "A String",
173 },
174 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
175 # by the metadata values provided here. Populated for ListJobs and all GetJob
176 # views SUMMARY and higher.
177 # ListJob response and Job SUMMARY view.
178 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
179 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
180 "version": "A String", # The version of the SDK used to run the job.
181 "sdkSupportStatus": "A String", # The support status for this SDK version.
182 },
183 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
184 { # Metadata for a PubSub connector used by the job.
185 "topic": "A String", # Topic accessed in the connection.
186 "subscription": "A String", # Subscription used in the connection.
187 },
188 ],
189 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
190 { # Metadata for a Datastore connector used by the job.
191 "projectId": "A String", # ProjectId accessed in the connection.
192 "namespace": "A String", # Namespace used in the connection.
193 },
194 ],
195 "fileDetails": [ # Identification of a File source used in the Dataflow job.
196 { # Metadata for a File connector used by the job.
197 "filePattern": "A String", # File Pattern used to access files by the connector.
198 },
199 ],
200 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
201 { # Metadata for a Spanner connector used by the job.
202 "instanceId": "A String", # InstanceId accessed in the connection.
203 "projectId": "A String", # ProjectId accessed in the connection.
204 "databaseId": "A String", # DatabaseId accessed in the connection.
205 },
206 ],
207 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
208 { # Metadata for a BigTable connector used by the job.
209 "instanceId": "A String", # InstanceId accessed in the connection.
210 "projectId": "A String", # ProjectId accessed in the connection.
211 "tableId": "A String", # TableId accessed in the connection.
212 },
213 ],
214 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
215 { # Metadata for a BigQuery connector used by the job.
216 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700217 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -0700218 "table": "A String", # Table accessed in the connection.
219 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700220 },
221 ],
222 },
223 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
224 # A description of the user pipeline and stages through which it is executed.
225 # Created by Cloud Dataflow service. Only retrieved with
226 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
227 # form. This data is provided by the Dataflow service for ease of visualizing
228 # the pipeline and interpreting Dataflow provided metrics.
229 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
230 { # Description of the type, names/ids, and input/outputs for a transform.
231 "kind": "A String", # Type of transform.
232 "name": "A String", # User provided name for this transform instance.
233 "inputCollectionName": [ # User names for all collection inputs to this transform.
234 "A String",
235 ],
236 "displayData": [ # Transform-specific display data.
237 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700238 "key": "A String", # The key identifying the display data.
239 # This is intended to be used as a label for the display data
240 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700241 "shortStrValue": "A String", # A possible additional shorter value to display.
242 # For example a java_class_name_value of com.mypackage.MyDoFn
243 # will be stored with MyDoFn as the short_str_value and
244 # com.mypackage.MyDoFn as the java_class_name value.
245 # short_str_value can be displayed and java_class_name_value
246 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700247 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700248 "url": "A String", # An optional full URL.
249 "floatValue": 3.14, # Contains value if the data is of float type.
250 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
251 # language namespace (i.e. python module) which defines the display data.
252 # This allows a dax monitoring system to specially handle the data
253 # and perform custom rendering.
254 "javaClassValue": "A String", # Contains value if the data is of java class type.
255 "label": "A String", # An optional label to display in a dax UI for the element.
256 "boolValue": True or False, # Contains value if the data is of a boolean type.
257 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700258 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700259 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700260 },
261 ],
262 "outputCollectionName": [ # User names for all collection outputs to this transform.
263 "A String",
264 ],
265 "id": "A String", # SDK generated id of this transform instance.
266 },
267 ],
268 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
269 { # Description of the composing transforms, names/ids, and input/outputs of a
270 # stage of execution. Some composing transforms and sources may have been
271 # generated by the Dataflow service during execution planning.
272 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
273 { # Description of an interstitial value between transforms in an execution
274 # stage.
275 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
276 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
277 # source is most closely associated.
278 "name": "A String", # Dataflow service generated name for this source.
279 },
280 ],
281 "kind": "A String", # Type of tranform this stage is executing.
282 "name": "A String", # Dataflow service generated name for this stage.
283 "outputSource": [ # Output sources for this stage.
284 { # Description of an input or output of an execution stage.
285 "userName": "A String", # Human-readable name for this source; may be user or system generated.
286 "sizeBytes": "A String", # Size of the source, if measurable.
287 "name": "A String", # Dataflow service generated name for this source.
288 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
289 # source is most closely associated.
290 },
291 ],
292 "inputSource": [ # Input sources for this stage.
293 { # Description of an input or output of an execution stage.
294 "userName": "A String", # Human-readable name for this source; may be user or system generated.
295 "sizeBytes": "A String", # Size of the source, if measurable.
296 "name": "A String", # Dataflow service generated name for this source.
297 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
298 # source is most closely associated.
299 },
300 ],
301 "componentTransform": [ # Transforms that comprise this execution stage.
302 { # Description of a transform executed as part of an execution stage.
303 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
304 "originalTransform": "A String", # User name for the original user transform with which this transform is
305 # most closely associated.
306 "name": "A String", # Dataflow service generated name for this source.
307 },
308 ],
309 "id": "A String", # Dataflow service generated id for this stage.
310 },
311 ],
312 "displayData": [ # Pipeline level display data.
313 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700314 "key": "A String", # The key identifying the display data.
315 # This is intended to be used as a label for the display data
316 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700317 "shortStrValue": "A String", # A possible additional shorter value to display.
318 # For example a java_class_name_value of com.mypackage.MyDoFn
319 # will be stored with MyDoFn as the short_str_value and
320 # com.mypackage.MyDoFn as the java_class_name value.
321 # short_str_value can be displayed and java_class_name_value
322 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700323 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700324 "url": "A String", # An optional full URL.
325 "floatValue": 3.14, # Contains value if the data is of float type.
326 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
327 # language namespace (i.e. python module) which defines the display data.
328 # This allows a dax monitoring system to specially handle the data
329 # and perform custom rendering.
330 "javaClassValue": "A String", # Contains value if the data is of java class type.
331 "label": "A String", # An optional label to display in a dax UI for the element.
332 "boolValue": True or False, # Contains value if the data is of a boolean type.
333 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700334 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700335 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700336 },
337 ],
338 },
339 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
340 # callers cannot mutate it.
341 { # A message describing the state of a particular execution stage.
342 "executionStageName": "A String", # The name of the execution stage.
343 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
344 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
345 },
346 ],
347 "id": "A String", # The unique ID of this job.
348 #
349 # This field is set by the Cloud Dataflow service when the Job is
350 # created, and is immutable for the life of the job.
351 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
352 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
353 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
354 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
355 # corresponding name prefixes of the new job.
356 "a_key": "A String",
357 },
358 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700359 "workerRegion": "A String", # The Compute Engine region
360 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
361 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
362 # with worker_zone. If neither worker_region nor worker_zone is specified,
363 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700364 "version": { # A structure describing which components and their versions of the service
365 # are required in order to run the job.
366 "a_key": "", # Properties of the object.
367 },
368 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
369 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
370 # at rest, AKA a Customer Managed Encryption Key (CMEK).
371 #
372 # Format:
373 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
374 "internalExperiments": { # Experimental settings.
375 "a_key": "", # Properties of the object. Contains field @type with type URL.
376 },
377 "dataset": "A String", # The dataset for the current project where various workflow
378 # related tables are stored.
379 #
380 # The supported resource type is:
381 #
382 # Google BigQuery:
383 # bigquery.googleapis.com/{dataset}
384 "experiments": [ # The list of experiments to enable.
385 "A String",
386 ],
387 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
388 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
389 # options are passed through the service and are used to recreate the
390 # SDK pipeline options on the worker in a language agnostic and platform
391 # independent way.
392 "a_key": "", # Properties of the object.
393 },
394 "userAgent": { # A description of the process that generated the request.
395 "a_key": "", # Properties of the object.
396 },
Dan O'Mearadd494642020-05-01 07:42:23 -0700397 "workerZone": "A String", # The Compute Engine zone
398 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
399 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
400 # with worker_region. If neither worker_region nor worker_zone is specified,
401 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700402 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
403 # specified in order for the job to have workers.
404 { # Describes one particular pool of Cloud Dataflow workers to be
405 # instantiated by the Cloud Dataflow service in order to perform the
406 # computations required by a job. Note that a workflow job may use
407 # multiple pools, in order to match the various computational
408 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700409 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
410 # harness, residing in Google Container Registry.
411 #
412 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
413 "ipConfiguration": "A String", # Configuration for VM IPs.
414 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
415 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
416 "algorithm": "A String", # The algorithm to use for autoscaling.
417 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700418 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -0700419 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
420 # the service will use the network "default".
421 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
422 # will attempt to choose a reasonable default.
423 "metadata": { # Metadata to set on the Google Compute Engine VMs.
424 "a_key": "A String",
425 },
426 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
427 # service will attempt to choose a reasonable default.
428 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
429 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700430 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
431 # using the standard Dataflow task runner. Users should ignore
432 # this field.
433 "workflowFileName": "A String", # The file to store the workflow in.
434 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
435 # will not be uploaded.
436 #
437 # The supported resource type is:
438 #
439 # Google Cloud Storage:
440 # storage.googleapis.com/{bucket}/{object}
441 # bucket.storage.googleapis.com/{object}
442 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -0700443 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
444 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
445 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
446 "vmId": "A String", # The ID string of the VM.
447 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
448 # taskrunner; e.g. "wheel".
449 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
450 # taskrunner; e.g. "root".
451 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
452 # access the Cloud Dataflow API.
453 "A String",
454 ],
455 "languageHint": "A String", # The suggested backend language.
456 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
457 # console.
458 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
459 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700460 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
461 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
462 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
463 # "shuffle/v1beta1".
464 "workerId": "A String", # The ID of the worker running this pipeline.
465 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
466 #
467 # When workers access Google Cloud APIs, they logically do so via
468 # relative URLs. If this field is specified, it supplies the base
469 # URL to use for resolving these relative URLs. The normative
470 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
471 # Locators".
472 #
473 # If not specified, the default value is "http://www.googleapis.com/"
474 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
475 # "dataflow/v1b3/projects".
476 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
477 # storage.
478 #
479 # The supported resource type is:
480 #
481 # Google Cloud Storage:
482 #
483 # storage.googleapis.com/{bucket}/{object}
484 # bucket.storage.googleapis.com/{object}
485 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700486 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
487 "harnessCommand": "A String", # The command to launch the worker harness.
488 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
489 # temporary storage.
490 #
491 # The supported resource type is:
492 #
493 # Google Cloud Storage:
494 # storage.googleapis.com/{bucket}/{object}
495 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -0700496 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
497 #
498 # When workers access Google Cloud APIs, they logically do so via
499 # relative URLs. If this field is specified, it supplies the base
500 # URL to use for resolving these relative URLs. The normative
501 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
502 # Locators".
503 #
504 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700505 },
Dan O'Mearadd494642020-05-01 07:42:23 -0700506 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
507 # service will choose a number of threads (according to the number of cores
508 # on the selected machine type for batch, or 1 by convention for streaming).
509 "poolArgs": { # Extra arguments for this worker pool.
510 "a_key": "", # Properties of the object. Contains field @type with type URL.
511 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700512 "packages": [ # Packages to be installed on workers.
513 { # The packages that must be installed in order for a worker to run the
514 # steps of the Cloud Dataflow job that will be assigned to its worker
515 # pool.
516 #
517 # This is the mechanism by which the Cloud Dataflow SDK causes code to
518 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
519 # might use this to install jars containing the user's code and all of the
520 # various dependencies (libraries, data files, etc.) required in order
521 # for that code to run.
522 "location": "A String", # The resource to read the package from. The supported resource type is:
523 #
524 # Google Cloud Storage:
525 #
526 # storage.googleapis.com/{bucket}
527 # bucket.storage.googleapis.com/
528 "name": "A String", # The name of the package.
529 },
530 ],
Dan O'Mearadd494642020-05-01 07:42:23 -0700531 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
532 # select a default set of packages which are useful to worker
533 # harnesses written in a particular language.
534 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
535 # are supported.
536 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700537 # attempt to choose a reasonable default.
538 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
539 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
540 # `TEARDOWN_NEVER`.
541 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
542 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
543 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
544 # down.
545 #
546 # If the workers are not torn down by the service, they will
547 # continue to run and use Google Compute Engine VM resources in the
548 # user's project until they are explicitly terminated by the user.
549 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
550 # policy except for small, manually supervised test jobs.
551 #
552 # If unknown or unspecified, the service will attempt to choose a reasonable
553 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -0700554 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
555 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700556 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
557 # execute the job. If zero or unspecified, the service will
558 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700559 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
560 # the form "regions/REGION/subnetworks/SUBNETWORK".
561 "dataDisks": [ # Data disks that are used by a VM in this workflow.
562 { # Describes the data disk used by a workflow job.
563 "mountPoint": "A String", # Directory in a VM where disk is mounted.
564 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
565 # attempt to choose a reasonable default.
566 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
567 # must be a disk type appropriate to the project and zone in which
568 # the workers will run. If unknown or unspecified, the service
569 # will attempt to choose a reasonable default.
570 #
571 # For example, the standard persistent disk type is a resource name
572 # typically ending in "pd-standard". If SSD persistent disks are
573 # available, the resource name typically ends with "pd-ssd". The
574 # actual valid values are defined the Google Compute Engine API,
575 # not by the Cloud Dataflow API; consult the Google Compute Engine
576 # documentation for more information about determining the set of
577 # available disk types for a particular project and zone.
578 #
579 # Google Compute Engine Disk types are local to a particular
580 # project in a particular zone, and so the resource name will
581 # typically look something like this:
582 #
583 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
584 },
585 ],
Dan O'Mearadd494642020-05-01 07:42:23 -0700586 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
587 # only be set in the Fn API path. For non-cross-language pipelines this
588 # should have only one entry. Cross-language pipelines will have two or more
589 # entries.
590 { # Defines a SDK harness container for executing Dataflow pipelines.
591 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
592 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
593 # container instance with this image. If false (or unset) recommends using
594 # more than one core per SDK container instance with this image for
595 # efficiency. Note that Dataflow service may choose to override this property
596 # if needed.
597 },
598 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700599 },
600 ],
Dan O'Mearadd494642020-05-01 07:42:23 -0700601 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
602 # unspecified, the service will attempt to choose a reasonable
603 # default. This should be in the form of the API service name,
604 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700605 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
606 # storage. The system will append the suffix "/temp-{JOBNAME} to
607 # this resource prefix, where {JOBNAME} is the value of the
608 # job_name field. The resulting bucket and object prefix is used
609 # as the prefix of the resources used to store temporary data
610 # needed during the job execution. NOTE: This will override the
611 # value in taskrunner_settings.
612 # The supported resource type is:
613 #
614 # Google Cloud Storage:
615 #
616 # storage.googleapis.com/{bucket}/{object}
617 # bucket.storage.googleapis.com/{object}
618 },
619 "location": "A String", # The [regional endpoint]
620 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
621 # contains this job.
622 "tempFiles": [ # A set of files the system should be aware of that are used
623 # for temporary storage. These temporary files will be
624 # removed on job completion.
625 # No duplicates are allowed.
626 # No file patterns are supported.
627 #
628 # The supported files are:
629 #
630 # Google Cloud Storage:
631 #
632 # storage.googleapis.com/{bucket}/{object}
633 # bucket.storage.googleapis.com/{object}
634 "A String",
635 ],
636 "type": "A String", # The type of Cloud Dataflow job.
637 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
638 # If this field is set, the service will ensure its uniqueness.
639 # The request to create a job will fail if the service has knowledge of a
640 # previously submitted job with the same client's ID and job name.
641 # The caller may use this field to ensure idempotence of job
642 # creation across retried attempts to create a job.
643 # By default, the field is empty and, in that case, the service ignores it.
644 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
645 # snapshot.
646 "stepsLocation": "A String", # The GCS location where the steps are stored.
647 "currentStateTime": "A String", # The timestamp associated with the current state.
648 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
649 # Flexible resource scheduling jobs are started with some delay after job
650 # creation, so start_time is unset before start and is updated when the
651 # job is started by the Cloud Dataflow service. For other jobs, start_time
652 # always equals to create_time and is immutable and set by the Cloud Dataflow
653 # service.
654 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
655 # Cloud Dataflow service.
656 "requestedState": "A String", # The job's requested state.
657 #
658 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
659 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
660 # also be used to directly set a job's requested state to
661 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
662 # job if it has not already reached a terminal state.
663 "name": "A String", # The user-specified Cloud Dataflow job name.
664 #
665 # Only one Job with a given name may exist in a project at any
666 # given time. If a caller attempts to create a Job with the same
667 # name as an already-existing Job, the attempt returns the
668 # existing Job.
669 #
670 # The name must match the regular expression
671 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
672 "steps": [ # Exactly one of step or steps_location should be specified.
673 #
674 # The top-level steps that constitute the entire job.
675 { # Defines a particular step within a Cloud Dataflow job.
676 #
677 # A job consists of multiple steps, each of which performs some
678 # specific operation as part of the overall job. Data is typically
679 # passed from one step to another as part of the job.
680 #
681 # Here's an example of a sequence of steps which together implement a
682 # Map-Reduce job:
683 #
684 # * Read a collection of data from some source, parsing the
685 # collection's elements.
686 #
687 # * Validate the elements.
688 #
689 # * Apply a user-defined function to map each element to some value
690 # and extract an element-specific key value.
691 #
692 # * Group elements with the same key into a single element with
693 # that key, transforming a multiply-keyed collection into a
694 # uniquely-keyed collection.
695 #
696 # * Write the elements out to some data sink.
697 #
698 # Note that the Cloud Dataflow service may be used to run many different
699 # types of jobs, not just Map-Reduce.
700 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700701 "name": "A String", # The name that identifies the step. This must be unique for each
702 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700703 "properties": { # Named properties associated with the step. Each kind of
704 # predefined step has its own required set of properties.
705 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
706 "a_key": "", # Properties of the object.
707 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700708 },
709 ],
710 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
711 # of the job it replaced.
712 #
713 # When sending a `CreateJobRequest`, you can update a job by specifying it
714 # here. The job named here is stopped, and its intermediate state is
715 # transferred to this job.
716 "currentState": "A String", # The current state of the job.
717 #
718 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
719 # specified.
720 #
721 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
722 # terminal state. After a job has reached a terminal state, no
723 # further state updates may be made.
724 #
725 # This field may be mutated by the Cloud Dataflow service;
726 # callers cannot mutate it.
727 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
728 # isn't contained in the submitted job.
729 "stages": { # A mapping from each stage to the information about that stage.
730 "a_key": { # Contains information about how a particular
731 # google.dataflow.v1beta3.Step will be executed.
732 "stepName": [ # The steps associated with the execution stage.
733 # Note that stages may have several steps, and that a given step
734 # might be run by more than one stage.
735 "A String",
736 ],
737 },
738 },
739 },
740 },
741 ],
742 }</pre>
743</div>
744
745<div class="method">
746 <code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>
747 <pre>Retrieves the next page of results.
748
749Args:
750 previous_request: The request for the previous page. (required)
751 previous_response: The response from the request for the previous page. (required)
752
753Returns:
754 A request object that you can call 'execute()' on to request the next
755 page. Returns None if there are no more items in the collection.
756 </pre>
757</div>
758
759<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -0700760 <code class="details" id="create">create(projectId, body=None, location=None, x__xgafv=None, replaceJobId=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400761 <pre>Creates a Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000762
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700763To create a job, we recommend using `projects.locations.jobs.create` with a
764[regional endpoint]
765(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
766`projects.jobs.create` is not recommended, as your job will always start
767in `us-central1`.
768
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000769Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400770 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -0700771 body: object, The request body.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000772 The object takes the form of:
773
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400774{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700775 "labels": { # User-defined labels for this job.
776 #
777 # The labels map can contain no more than 64 entries. Entries of the labels
778 # map are UTF8 strings that comply with the following restrictions:
779 #
780 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
781 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -0700782 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700783 # size.
784 "a_key": "A String",
785 },
786 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
787 # by the metadata values provided here. Populated for ListJobs and all GetJob
788 # views SUMMARY and higher.
789 # ListJob response and Job SUMMARY view.
790 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
791 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
792 "version": "A String", # The version of the SDK used to run the job.
793 "sdkSupportStatus": "A String", # The support status for this SDK version.
794 },
795 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
796 { # Metadata for a PubSub connector used by the job.
797 "topic": "A String", # Topic accessed in the connection.
798 "subscription": "A String", # Subscription used in the connection.
799 },
800 ],
801 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
802 { # Metadata for a Datastore connector used by the job.
803 "projectId": "A String", # ProjectId accessed in the connection.
804 "namespace": "A String", # Namespace used in the connection.
805 },
806 ],
807 "fileDetails": [ # Identification of a File source used in the Dataflow job.
808 { # Metadata for a File connector used by the job.
809 "filePattern": "A String", # File Pattern used to access files by the connector.
810 },
811 ],
812 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
813 { # Metadata for a Spanner connector used by the job.
814 "instanceId": "A String", # InstanceId accessed in the connection.
815 "projectId": "A String", # ProjectId accessed in the connection.
816 "databaseId": "A String", # DatabaseId accessed in the connection.
817 },
818 ],
819 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
820 { # Metadata for a BigTable connector used by the job.
821 "instanceId": "A String", # InstanceId accessed in the connection.
822 "projectId": "A String", # ProjectId accessed in the connection.
823 "tableId": "A String", # TableId accessed in the connection.
824 },
825 ],
826 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
827 { # Metadata for a BigQuery connector used by the job.
828 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700829 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -0700830 "table": "A String", # Table accessed in the connection.
831 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700832 },
833 ],
834 },
835 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
836 # A description of the user pipeline and stages through which it is executed.
837 # Created by Cloud Dataflow service. Only retrieved with
838 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
839 # form. This data is provided by the Dataflow service for ease of visualizing
840 # the pipeline and interpreting Dataflow provided metrics.
841 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
842 { # Description of the type, names/ids, and input/outputs for a transform.
843 "kind": "A String", # Type of transform.
844 "name": "A String", # User provided name for this transform instance.
845 "inputCollectionName": [ # User names for all collection inputs to this transform.
846 "A String",
847 ],
848 "displayData": [ # Transform-specific display data.
849 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700850 "key": "A String", # The key identifying the display data.
851 # This is intended to be used as a label for the display data
852 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700853 "shortStrValue": "A String", # A possible additional shorter value to display.
854 # For example a java_class_name_value of com.mypackage.MyDoFn
855 # will be stored with MyDoFn as the short_str_value and
856 # com.mypackage.MyDoFn as the java_class_name value.
857 # short_str_value can be displayed and java_class_name_value
858 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700859 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700860 "url": "A String", # An optional full URL.
861 "floatValue": 3.14, # Contains value if the data is of float type.
862 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
863 # language namespace (i.e. python module) which defines the display data.
864 # This allows a dax monitoring system to specially handle the data
865 # and perform custom rendering.
866 "javaClassValue": "A String", # Contains value if the data is of java class type.
867 "label": "A String", # An optional label to display in a dax UI for the element.
868 "boolValue": True or False, # Contains value if the data is of a boolean type.
869 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700870 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700871 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700872 },
873 ],
874 "outputCollectionName": [ # User names for all collection outputs to this transform.
875 "A String",
876 ],
877 "id": "A String", # SDK generated id of this transform instance.
878 },
879 ],
880 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
881 { # Description of the composing transforms, names/ids, and input/outputs of a
882 # stage of execution. Some composing transforms and sources may have been
883 # generated by the Dataflow service during execution planning.
884 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
885 { # Description of an interstitial value between transforms in an execution
886 # stage.
887 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
888 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
889 # source is most closely associated.
890 "name": "A String", # Dataflow service generated name for this source.
891 },
892 ],
893 "kind": "A String", # Type of tranform this stage is executing.
894 "name": "A String", # Dataflow service generated name for this stage.
895 "outputSource": [ # Output sources for this stage.
896 { # Description of an input or output of an execution stage.
897 "userName": "A String", # Human-readable name for this source; may be user or system generated.
898 "sizeBytes": "A String", # Size of the source, if measurable.
899 "name": "A String", # Dataflow service generated name for this source.
900 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
901 # source is most closely associated.
902 },
903 ],
904 "inputSource": [ # Input sources for this stage.
905 { # Description of an input or output of an execution stage.
906 "userName": "A String", # Human-readable name for this source; may be user or system generated.
907 "sizeBytes": "A String", # Size of the source, if measurable.
908 "name": "A String", # Dataflow service generated name for this source.
909 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
910 # source is most closely associated.
911 },
912 ],
913 "componentTransform": [ # Transforms that comprise this execution stage.
914 { # Description of a transform executed as part of an execution stage.
915 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
916 "originalTransform": "A String", # User name for the original user transform with which this transform is
917 # most closely associated.
918 "name": "A String", # Dataflow service generated name for this source.
919 },
920 ],
921 "id": "A String", # Dataflow service generated id for this stage.
922 },
923 ],
924 "displayData": [ # Pipeline level display data.
925 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700926 "key": "A String", # The key identifying the display data.
927 # This is intended to be used as a label for the display data
928 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700929 "shortStrValue": "A String", # A possible additional shorter value to display.
930 # For example a java_class_name_value of com.mypackage.MyDoFn
931 # will be stored with MyDoFn as the short_str_value and
932 # com.mypackage.MyDoFn as the java_class_name value.
933 # short_str_value can be displayed and java_class_name_value
934 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700935 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700936 "url": "A String", # An optional full URL.
937 "floatValue": 3.14, # Contains value if the data is of float type.
938 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
939 # language namespace (i.e. python module) which defines the display data.
940 # This allows a dax monitoring system to specially handle the data
941 # and perform custom rendering.
942 "javaClassValue": "A String", # Contains value if the data is of java class type.
943 "label": "A String", # An optional label to display in a dax UI for the element.
944 "boolValue": True or False, # Contains value if the data is of a boolean type.
945 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700946 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700947 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700948 },
949 ],
950 },
951 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
952 # callers cannot mutate it.
953 { # A message describing the state of a particular execution stage.
954 "executionStageName": "A String", # The name of the execution stage.
955 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
956 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
957 },
958 ],
959 "id": "A String", # The unique ID of this job.
960 #
961 # This field is set by the Cloud Dataflow service when the Job is
962 # created, and is immutable for the life of the job.
963 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
964 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
965 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
966 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
967 # corresponding name prefixes of the new job.
968 "a_key": "A String",
969 },
970 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700971 "workerRegion": "A String", # The Compute Engine region
972 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
973 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
974 # with worker_zone. If neither worker_region nor worker_zone is specified,
975 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700976 "version": { # A structure describing which components and their versions of the service
977 # are required in order to run the job.
978 "a_key": "", # Properties of the object.
979 },
980 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
981 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
982 # at rest, AKA a Customer Managed Encryption Key (CMEK).
983 #
984 # Format:
985 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
986 "internalExperiments": { # Experimental settings.
987 "a_key": "", # Properties of the object. Contains field @type with type URL.
988 },
989 "dataset": "A String", # The dataset for the current project where various workflow
990 # related tables are stored.
991 #
992 # The supported resource type is:
993 #
994 # Google BigQuery:
995 # bigquery.googleapis.com/{dataset}
996 "experiments": [ # The list of experiments to enable.
997 "A String",
998 ],
999 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
1000 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
1001 # options are passed through the service and are used to recreate the
1002 # SDK pipeline options on the worker in a language agnostic and platform
1003 # independent way.
1004 "a_key": "", # Properties of the object.
1005 },
1006 "userAgent": { # A description of the process that generated the request.
1007 "a_key": "", # Properties of the object.
1008 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001009 "workerZone": "A String", # The Compute Engine zone
1010 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1011 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
1012 # with worker_region. If neither worker_region nor worker_zone is specified,
1013 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001014 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
1015 # specified in order for the job to have workers.
1016 { # Describes one particular pool of Cloud Dataflow workers to be
1017 # instantiated by the Cloud Dataflow service in order to perform the
1018 # computations required by a job. Note that a workflow job may use
1019 # multiple pools, in order to match the various computational
1020 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001021 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1022 # harness, residing in Google Container Registry.
1023 #
1024 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1025 "ipConfiguration": "A String", # Configuration for VM IPs.
1026 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1027 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1028 "algorithm": "A String", # The algorithm to use for autoscaling.
1029 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001030 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07001031 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1032 # the service will use the network "default".
1033 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1034 # will attempt to choose a reasonable default.
1035 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1036 "a_key": "A String",
1037 },
1038 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1039 # service will attempt to choose a reasonable default.
1040 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1041 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001042 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1043 # using the standard Dataflow task runner. Users should ignore
1044 # this field.
1045 "workflowFileName": "A String", # The file to store the workflow in.
1046 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
1047 # will not be uploaded.
1048 #
1049 # The supported resource type is:
1050 #
1051 # Google Cloud Storage:
1052 # storage.googleapis.com/{bucket}/{object}
1053 # bucket.storage.googleapis.com/{object}
1054 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07001055 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
1056 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1057 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1058 "vmId": "A String", # The ID string of the VM.
1059 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1060 # taskrunner; e.g. "wheel".
1061 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1062 # taskrunner; e.g. "root".
1063 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1064 # access the Cloud Dataflow API.
1065 "A String",
1066 ],
1067 "languageHint": "A String", # The suggested backend language.
1068 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1069 # console.
1070 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1071 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001072 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1073 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1074 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1075 # "shuffle/v1beta1".
1076 "workerId": "A String", # The ID of the worker running this pipeline.
1077 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1078 #
1079 # When workers access Google Cloud APIs, they logically do so via
1080 # relative URLs. If this field is specified, it supplies the base
1081 # URL to use for resolving these relative URLs. The normative
1082 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1083 # Locators".
1084 #
1085 # If not specified, the default value is "http://www.googleapis.com/"
1086 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1087 # "dataflow/v1b3/projects".
1088 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1089 # storage.
1090 #
1091 # The supported resource type is:
1092 #
1093 # Google Cloud Storage:
1094 #
1095 # storage.googleapis.com/{bucket}/{object}
1096 # bucket.storage.googleapis.com/{object}
1097 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001098 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
1099 "harnessCommand": "A String", # The command to launch the worker harness.
1100 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1101 # temporary storage.
1102 #
1103 # The supported resource type is:
1104 #
1105 # Google Cloud Storage:
1106 # storage.googleapis.com/{bucket}/{object}
1107 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07001108 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1109 #
1110 # When workers access Google Cloud APIs, they logically do so via
1111 # relative URLs. If this field is specified, it supplies the base
1112 # URL to use for resolving these relative URLs. The normative
1113 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1114 # Locators".
1115 #
1116 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001117 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001118 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1119 # service will choose a number of threads (according to the number of cores
1120 # on the selected machine type for batch, or 1 by convention for streaming).
1121 "poolArgs": { # Extra arguments for this worker pool.
1122 "a_key": "", # Properties of the object. Contains field @type with type URL.
1123 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001124 "packages": [ # Packages to be installed on workers.
1125 { # The packages that must be installed in order for a worker to run the
1126 # steps of the Cloud Dataflow job that will be assigned to its worker
1127 # pool.
1128 #
1129 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1130 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1131 # might use this to install jars containing the user's code and all of the
1132 # various dependencies (libraries, data files, etc.) required in order
1133 # for that code to run.
1134 "location": "A String", # The resource to read the package from. The supported resource type is:
1135 #
1136 # Google Cloud Storage:
1137 #
1138 # storage.googleapis.com/{bucket}
1139 # bucket.storage.googleapis.com/
1140 "name": "A String", # The name of the package.
1141 },
1142 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001143 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1144 # select a default set of packages which are useful to worker
1145 # harnesses written in a particular language.
1146 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1147 # are supported.
1148 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001149 # attempt to choose a reasonable default.
1150 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1151 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1152 # `TEARDOWN_NEVER`.
1153 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1154 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1155 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1156 # down.
1157 #
1158 # If the workers are not torn down by the service, they will
1159 # continue to run and use Google Compute Engine VM resources in the
1160 # user's project until they are explicitly terminated by the user.
1161 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1162 # policy except for small, manually supervised test jobs.
1163 #
1164 # If unknown or unspecified, the service will attempt to choose a reasonable
1165 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07001166 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1167 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001168 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1169 # execute the job. If zero or unspecified, the service will
1170 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001171 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1172 # the form "regions/REGION/subnetworks/SUBNETWORK".
1173 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1174 { # Describes the data disk used by a workflow job.
1175 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1176 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1177 # attempt to choose a reasonable default.
1178 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1179 # must be a disk type appropriate to the project and zone in which
1180 # the workers will run. If unknown or unspecified, the service
1181 # will attempt to choose a reasonable default.
1182 #
1183 # For example, the standard persistent disk type is a resource name
1184 # typically ending in "pd-standard". If SSD persistent disks are
1185 # available, the resource name typically ends with "pd-ssd". The
1186 # actual valid values are defined the Google Compute Engine API,
1187 # not by the Cloud Dataflow API; consult the Google Compute Engine
1188 # documentation for more information about determining the set of
1189 # available disk types for a particular project and zone.
1190 #
1191 # Google Compute Engine Disk types are local to a particular
1192 # project in a particular zone, and so the resource name will
1193 # typically look something like this:
1194 #
1195 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
1196 },
1197 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001198 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
1199 # only be set in the Fn API path. For non-cross-language pipelines this
1200 # should have only one entry. Cross-language pipelines will have two or more
1201 # entries.
1202 { # Defines a SDK harness container for executing Dataflow pipelines.
1203 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
1204 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
1205 # container instance with this image. If false (or unset) recommends using
1206 # more than one core per SDK container instance with this image for
1207 # efficiency. Note that Dataflow service may choose to override this property
1208 # if needed.
1209 },
1210 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001211 },
1212 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001213 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
1214 # unspecified, the service will attempt to choose a reasonable
1215 # default. This should be in the form of the API service name,
1216 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001217 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1218 # storage. The system will append the suffix "/temp-{JOBNAME} to
1219 # this resource prefix, where {JOBNAME} is the value of the
1220 # job_name field. The resulting bucket and object prefix is used
1221 # as the prefix of the resources used to store temporary data
1222 # needed during the job execution. NOTE: This will override the
1223 # value in taskrunner_settings.
1224 # The supported resource type is:
1225 #
1226 # Google Cloud Storage:
1227 #
1228 # storage.googleapis.com/{bucket}/{object}
1229 # bucket.storage.googleapis.com/{object}
1230 },
1231 "location": "A String", # The [regional endpoint]
1232 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1233 # contains this job.
1234 "tempFiles": [ # A set of files the system should be aware of that are used
1235 # for temporary storage. These temporary files will be
1236 # removed on job completion.
1237 # No duplicates are allowed.
1238 # No file patterns are supported.
1239 #
1240 # The supported files are:
1241 #
1242 # Google Cloud Storage:
1243 #
1244 # storage.googleapis.com/{bucket}/{object}
1245 # bucket.storage.googleapis.com/{object}
1246 "A String",
1247 ],
1248 "type": "A String", # The type of Cloud Dataflow job.
1249 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1250 # If this field is set, the service will ensure its uniqueness.
1251 # The request to create a job will fail if the service has knowledge of a
1252 # previously submitted job with the same client's ID and job name.
1253 # The caller may use this field to ensure idempotence of job
1254 # creation across retried attempts to create a job.
1255 # By default, the field is empty and, in that case, the service ignores it.
1256 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1257 # snapshot.
1258 "stepsLocation": "A String", # The GCS location where the steps are stored.
1259 "currentStateTime": "A String", # The timestamp associated with the current state.
1260 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1261 # Flexible resource scheduling jobs are started with some delay after job
1262 # creation, so start_time is unset before start and is updated when the
1263 # job is started by the Cloud Dataflow service. For other jobs, start_time
1264 # always equals to create_time and is immutable and set by the Cloud Dataflow
1265 # service.
1266 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1267 # Cloud Dataflow service.
1268 "requestedState": "A String", # The job's requested state.
1269 #
1270 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1271 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1272 # also be used to directly set a job's requested state to
1273 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1274 # job if it has not already reached a terminal state.
1275 "name": "A String", # The user-specified Cloud Dataflow job name.
1276 #
1277 # Only one Job with a given name may exist in a project at any
1278 # given time. If a caller attempts to create a Job with the same
1279 # name as an already-existing Job, the attempt returns the
1280 # existing Job.
1281 #
1282 # The name must match the regular expression
1283 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1284 "steps": [ # Exactly one of step or steps_location should be specified.
1285 #
1286 # The top-level steps that constitute the entire job.
1287 { # Defines a particular step within a Cloud Dataflow job.
1288 #
1289 # A job consists of multiple steps, each of which performs some
1290 # specific operation as part of the overall job. Data is typically
1291 # passed from one step to another as part of the job.
1292 #
1293 # Here's an example of a sequence of steps which together implement a
1294 # Map-Reduce job:
1295 #
1296 # * Read a collection of data from some source, parsing the
1297 # collection's elements.
1298 #
1299 # * Validate the elements.
1300 #
1301 # * Apply a user-defined function to map each element to some value
1302 # and extract an element-specific key value.
1303 #
1304 # * Group elements with the same key into a single element with
1305 # that key, transforming a multiply-keyed collection into a
1306 # uniquely-keyed collection.
1307 #
1308 # * Write the elements out to some data sink.
1309 #
1310 # Note that the Cloud Dataflow service may be used to run many different
1311 # types of jobs, not just Map-Reduce.
1312 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001313 "name": "A String", # The name that identifies the step. This must be unique for each
1314 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001315 "properties": { # Named properties associated with the step. Each kind of
1316 # predefined step has its own required set of properties.
1317 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1318 "a_key": "", # Properties of the object.
1319 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001320 },
1321 ],
1322 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1323 # of the job it replaced.
1324 #
1325 # When sending a `CreateJobRequest`, you can update a job by specifying it
1326 # here. The job named here is stopped, and its intermediate state is
1327 # transferred to this job.
1328 "currentState": "A String", # The current state of the job.
1329 #
1330 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1331 # specified.
1332 #
1333 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1334 # terminal state. After a job has reached a terminal state, no
1335 # further state updates may be made.
1336 #
1337 # This field may be mutated by the Cloud Dataflow service;
1338 # callers cannot mutate it.
1339 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1340 # isn't contained in the submitted job.
1341 "stages": { # A mapping from each stage to the information about that stage.
1342 "a_key": { # Contains information about how a particular
1343 # google.dataflow.v1beta3.Step will be executed.
1344 "stepName": [ # The steps associated with the execution stage.
1345 # Note that stages may have several steps, and that a given step
1346 # might be run by more than one stage.
1347 "A String",
1348 ],
1349 },
1350 },
1351 },
1352}
1353
1354 location: string, The [regional endpoint]
1355(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1356contains this job.
1357 x__xgafv: string, V1 error format.
1358 Allowed values
1359 1 - v1 error format
1360 2 - v2 error format
1361 replaceJobId: string, Deprecated. This field is now in the Job message.
1362 view: string, The level of information requested in response.
1363
1364Returns:
1365 An object of the form:
1366
1367 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001368 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001369 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001370 # The labels map can contain no more than 64 entries. Entries of the labels
1371 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001372 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001373 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1374 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07001375 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001376 # size.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07001377 "a_key": "A String",
1378 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001379 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1380 # by the metadata values provided here. Populated for ListJobs and all GetJob
1381 # views SUMMARY and higher.
1382 # ListJob response and Job SUMMARY view.
1383 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1384 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1385 "version": "A String", # The version of the SDK used to run the job.
1386 "sdkSupportStatus": "A String", # The support status for this SDK version.
1387 },
1388 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1389 { # Metadata for a PubSub connector used by the job.
1390 "topic": "A String", # Topic accessed in the connection.
1391 "subscription": "A String", # Subscription used in the connection.
1392 },
1393 ],
1394 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
1395 { # Metadata for a Datastore connector used by the job.
1396 "projectId": "A String", # ProjectId accessed in the connection.
1397 "namespace": "A String", # Namespace used in the connection.
1398 },
1399 ],
1400 "fileDetails": [ # Identification of a File source used in the Dataflow job.
1401 { # Metadata for a File connector used by the job.
1402 "filePattern": "A String", # File Pattern used to access files by the connector.
1403 },
1404 ],
1405 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
1406 { # Metadata for a Spanner connector used by the job.
1407 "instanceId": "A String", # InstanceId accessed in the connection.
1408 "projectId": "A String", # ProjectId accessed in the connection.
1409 "databaseId": "A String", # DatabaseId accessed in the connection.
1410 },
1411 ],
1412 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
1413 { # Metadata for a BigTable connector used by the job.
1414 "instanceId": "A String", # InstanceId accessed in the connection.
1415 "projectId": "A String", # ProjectId accessed in the connection.
1416 "tableId": "A String", # TableId accessed in the connection.
1417 },
1418 ],
1419 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
1420 { # Metadata for a BigQuery connector used by the job.
1421 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001422 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07001423 "table": "A String", # Table accessed in the connection.
1424 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001425 },
1426 ],
1427 },
1428 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1429 # A description of the user pipeline and stages through which it is executed.
1430 # Created by Cloud Dataflow service. Only retrieved with
1431 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1432 # form. This data is provided by the Dataflow service for ease of visualizing
1433 # the pipeline and interpreting Dataflow provided metrics.
1434 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
1435 { # Description of the type, names/ids, and input/outputs for a transform.
1436 "kind": "A String", # Type of transform.
1437 "name": "A String", # User provided name for this transform instance.
1438 "inputCollectionName": [ # User names for all collection inputs to this transform.
1439 "A String",
1440 ],
1441 "displayData": [ # Transform-specific display data.
1442 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07001443 "key": "A String", # The key identifying the display data.
1444 # This is intended to be used as a label for the display data
1445 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001446 "shortStrValue": "A String", # A possible additional shorter value to display.
1447 # For example a java_class_name_value of com.mypackage.MyDoFn
1448 # will be stored with MyDoFn as the short_str_value and
1449 # com.mypackage.MyDoFn as the java_class_name value.
1450 # short_str_value can be displayed and java_class_name_value
1451 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07001452 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001453 "url": "A String", # An optional full URL.
1454 "floatValue": 3.14, # Contains value if the data is of float type.
1455 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1456 # language namespace (i.e. python module) which defines the display data.
1457 # This allows a dax monitoring system to specially handle the data
1458 # and perform custom rendering.
1459 "javaClassValue": "A String", # Contains value if the data is of java class type.
1460 "label": "A String", # An optional label to display in a dax UI for the element.
1461 "boolValue": True or False, # Contains value if the data is of a boolean type.
1462 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07001463 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001464 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001465 },
1466 ],
1467 "outputCollectionName": [ # User names for all collection outputs to this transform.
1468 "A String",
1469 ],
1470 "id": "A String", # SDK generated id of this transform instance.
1471 },
1472 ],
1473 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
1474 { # Description of the composing transforms, names/ids, and input/outputs of a
1475 # stage of execution. Some composing transforms and sources may have been
1476 # generated by the Dataflow service during execution planning.
1477 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
1478 { # Description of an interstitial value between transforms in an execution
1479 # stage.
1480 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1481 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1482 # source is most closely associated.
1483 "name": "A String", # Dataflow service generated name for this source.
1484 },
1485 ],
1486 "kind": "A String", # Type of tranform this stage is executing.
1487 "name": "A String", # Dataflow service generated name for this stage.
1488 "outputSource": [ # Output sources for this stage.
1489 { # Description of an input or output of an execution stage.
1490 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1491 "sizeBytes": "A String", # Size of the source, if measurable.
1492 "name": "A String", # Dataflow service generated name for this source.
1493 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1494 # source is most closely associated.
1495 },
1496 ],
1497 "inputSource": [ # Input sources for this stage.
1498 { # Description of an input or output of an execution stage.
1499 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1500 "sizeBytes": "A String", # Size of the source, if measurable.
1501 "name": "A String", # Dataflow service generated name for this source.
1502 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1503 # source is most closely associated.
1504 },
1505 ],
1506 "componentTransform": [ # Transforms that comprise this execution stage.
1507 { # Description of a transform executed as part of an execution stage.
1508 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1509 "originalTransform": "A String", # User name for the original user transform with which this transform is
1510 # most closely associated.
1511 "name": "A String", # Dataflow service generated name for this source.
1512 },
1513 ],
1514 "id": "A String", # Dataflow service generated id for this stage.
1515 },
1516 ],
1517 "displayData": [ # Pipeline level display data.
1518 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07001519 "key": "A String", # The key identifying the display data.
1520 # This is intended to be used as a label for the display data
1521 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001522 "shortStrValue": "A String", # A possible additional shorter value to display.
1523 # For example a java_class_name_value of com.mypackage.MyDoFn
1524 # will be stored with MyDoFn as the short_str_value and
1525 # com.mypackage.MyDoFn as the java_class_name value.
1526 # short_str_value can be displayed and java_class_name_value
1527 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07001528 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001529 "url": "A String", # An optional full URL.
1530 "floatValue": 3.14, # Contains value if the data is of float type.
1531 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1532 # language namespace (i.e. python module) which defines the display data.
1533 # This allows a dax monitoring system to specially handle the data
1534 # and perform custom rendering.
1535 "javaClassValue": "A String", # Contains value if the data is of java class type.
1536 "label": "A String", # An optional label to display in a dax UI for the element.
1537 "boolValue": True or False, # Contains value if the data is of a boolean type.
1538 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07001539 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001540 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001541 },
1542 ],
1543 },
1544 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
1545 # callers cannot mutate it.
1546 { # A message describing the state of a particular execution stage.
1547 "executionStageName": "A String", # The name of the execution stage.
1548 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
1549 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
1550 },
1551 ],
1552 "id": "A String", # The unique ID of this job.
1553 #
1554 # This field is set by the Cloud Dataflow service when the Job is
1555 # created, and is immutable for the life of the job.
1556 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
1557 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1558 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001559 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
1560 # corresponding name prefixes of the new job.
Takashi Matsuo06694102015-09-11 13:55:40 -07001561 "a_key": "A String",
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001562 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001563 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001564 "workerRegion": "A String", # The Compute Engine region
1565 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1566 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
1567 # with worker_zone. If neither worker_region nor worker_zone is specified,
1568 # default to the control plane's region.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001569 "version": { # A structure describing which components and their versions of the service
1570 # are required in order to run the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07001571 "a_key": "", # Properties of the object.
1572 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001573 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
1574 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
1575 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001576 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001577 # Format:
1578 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Takashi Matsuo06694102015-09-11 13:55:40 -07001579 "internalExperiments": { # Experimental settings.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07001580 "a_key": "", # Properties of the object. Contains field @type with type URL.
Takashi Matsuo06694102015-09-11 13:55:40 -07001581 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001582 "dataset": "A String", # The dataset for the current project where various workflow
1583 # related tables are stored.
1584 #
1585 # The supported resource type is:
1586 #
1587 # Google BigQuery:
1588 # bigquery.googleapis.com/{dataset}
Takashi Matsuo06694102015-09-11 13:55:40 -07001589 "experiments": [ # The list of experiments to enable.
1590 "A String",
1591 ],
Sai Cheemalapatiea3a5e12016-10-12 14:05:53 -07001592 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001593 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
1594 # options are passed through the service and are used to recreate the
1595 # SDK pipeline options on the worker in a language agnostic and platform
1596 # independent way.
Takashi Matsuo06694102015-09-11 13:55:40 -07001597 "a_key": "", # Properties of the object.
1598 },
1599 "userAgent": { # A description of the process that generated the request.
1600 "a_key": "", # Properties of the object.
1601 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001602 "workerZone": "A String", # The Compute Engine zone
1603 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1604 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
1605 # with worker_region. If neither worker_region nor worker_zone is specified,
1606 # a zone in the control plane's region is chosen based on available capacity.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001607 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
1608 # specified in order for the job to have workers.
1609 { # Describes one particular pool of Cloud Dataflow workers to be
1610 # instantiated by the Cloud Dataflow service in order to perform the
1611 # computations required by a job. Note that a workflow job may use
1612 # multiple pools, in order to match the various computational
1613 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001614 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1615 # harness, residing in Google Container Registry.
1616 #
1617 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1618 "ipConfiguration": "A String", # Configuration for VM IPs.
1619 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1620 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1621 "algorithm": "A String", # The algorithm to use for autoscaling.
1622 },
Takashi Matsuo06694102015-09-11 13:55:40 -07001623 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07001624 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1625 # the service will use the network "default".
1626 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1627 # will attempt to choose a reasonable default.
1628 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1629 "a_key": "A String",
1630 },
1631 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1632 # service will attempt to choose a reasonable default.
1633 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1634 # Compute Engine API.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001635 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1636 # using the standard Dataflow task runner. Users should ignore
1637 # this field.
1638 "workflowFileName": "A String", # The file to store the workflow in.
1639 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
1640 # will not be uploaded.
1641 #
1642 # The supported resource type is:
1643 #
1644 # Google Cloud Storage:
1645 # storage.googleapis.com/{bucket}/{object}
1646 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001647 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07001648 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
1649 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1650 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1651 "vmId": "A String", # The ID string of the VM.
1652 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1653 # taskrunner; e.g. "wheel".
1654 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1655 # taskrunner; e.g. "root".
1656 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1657 # access the Cloud Dataflow API.
1658 "A String",
1659 ],
1660 "languageHint": "A String", # The suggested backend language.
1661 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1662 # console.
1663 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1664 "logDir": "A String", # The directory on the VM to store logs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001665 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1666 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1667 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1668 # "shuffle/v1beta1".
1669 "workerId": "A String", # The ID of the worker running this pipeline.
1670 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1671 #
1672 # When workers access Google Cloud APIs, they logically do so via
1673 # relative URLs. If this field is specified, it supplies the base
1674 # URL to use for resolving these relative URLs. The normative
1675 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1676 # Locators".
1677 #
1678 # If not specified, the default value is "http://www.googleapis.com/"
1679 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1680 # "dataflow/v1b3/projects".
1681 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1682 # storage.
1683 #
1684 # The supported resource type is:
1685 #
1686 # Google Cloud Storage:
1687 #
1688 # storage.googleapis.com/{bucket}/{object}
1689 # bucket.storage.googleapis.com/{object}
1690 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001691 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001692 "harnessCommand": "A String", # The command to launch the worker harness.
1693 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1694 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001695 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001696 # The supported resource type is:
1697 #
1698 # Google Cloud Storage:
1699 # storage.googleapis.com/{bucket}/{object}
1700 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07001701 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1702 #
1703 # When workers access Google Cloud APIs, they logically do so via
1704 # relative URLs. If this field is specified, it supplies the base
1705 # URL to use for resolving these relative URLs. The normative
1706 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1707 # Locators".
1708 #
1709 # If not specified, the default value is "http://www.googleapis.com/"
Takashi Matsuo06694102015-09-11 13:55:40 -07001710 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001711 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1712 # service will choose a number of threads (according to the number of cores
1713 # on the selected machine type for batch, or 1 by convention for streaming).
1714 "poolArgs": { # Extra arguments for this worker pool.
1715 "a_key": "", # Properties of the object. Contains field @type with type URL.
1716 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001717 "packages": [ # Packages to be installed on workers.
1718 { # The packages that must be installed in order for a worker to run the
1719 # steps of the Cloud Dataflow job that will be assigned to its worker
1720 # pool.
1721 #
1722 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1723 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1724 # might use this to install jars containing the user's code and all of the
1725 # various dependencies (libraries, data files, etc.) required in order
1726 # for that code to run.
1727 "location": "A String", # The resource to read the package from. The supported resource type is:
1728 #
1729 # Google Cloud Storage:
1730 #
1731 # storage.googleapis.com/{bucket}
1732 # bucket.storage.googleapis.com/
1733 "name": "A String", # The name of the package.
1734 },
1735 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001736 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1737 # select a default set of packages which are useful to worker
1738 # harnesses written in a particular language.
1739 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1740 # are supported.
1741 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001742 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001743 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1744 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1745 # `TEARDOWN_NEVER`.
1746 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1747 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1748 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1749 # down.
1750 #
1751 # If the workers are not torn down by the service, they will
1752 # continue to run and use Google Compute Engine VM resources in the
1753 # user's project until they are explicitly terminated by the user.
1754 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1755 # policy except for small, manually supervised test jobs.
1756 #
1757 # If unknown or unspecified, the service will attempt to choose a reasonable
1758 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07001759 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1760 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001761 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1762 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001763 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001764 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1765 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001766 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1767 { # Describes the data disk used by a workflow job.
1768 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1769 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1770 # attempt to choose a reasonable default.
1771 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1772 # must be a disk type appropriate to the project and zone in which
1773 # the workers will run. If unknown or unspecified, the service
1774 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001775 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001776 # For example, the standard persistent disk type is a resource name
1777 # typically ending in "pd-standard". If SSD persistent disks are
1778 # available, the resource name typically ends with "pd-ssd". The
1779 # actual valid values are defined the Google Compute Engine API,
1780 # not by the Cloud Dataflow API; consult the Google Compute Engine
1781 # documentation for more information about determining the set of
1782 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001783 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001784 # Google Compute Engine Disk types are local to a particular
1785 # project in a particular zone, and so the resource name will
1786 # typically look something like this:
1787 #
1788 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001789 },
1790 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001791 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
1792 # only be set in the Fn API path. For non-cross-language pipelines this
1793 # should have only one entry. Cross-language pipelines will have two or more
1794 # entries.
1795 { # Defines a SDK harness container for executing Dataflow pipelines.
1796 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
1797 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
1798 # container instance with this image. If false (or unset) recommends using
1799 # more than one core per SDK container instance with this image for
1800 # efficiency. Note that Dataflow service may choose to override this property
1801 # if needed.
1802 },
1803 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07001804 },
1805 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001806 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
1807 # unspecified, the service will attempt to choose a reasonable
1808 # default. This should be in the form of the API service name,
1809 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001810 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1811 # storage. The system will append the suffix "/temp-{JOBNAME} to
1812 # this resource prefix, where {JOBNAME} is the value of the
1813 # job_name field. The resulting bucket and object prefix is used
1814 # as the prefix of the resources used to store temporary data
1815 # needed during the job execution. NOTE: This will override the
1816 # value in taskrunner_settings.
1817 # The supported resource type is:
1818 #
1819 # Google Cloud Storage:
1820 #
1821 # storage.googleapis.com/{bucket}/{object}
1822 # bucket.storage.googleapis.com/{object}
1823 },
1824 "location": "A String", # The [regional endpoint]
1825 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1826 # contains this job.
1827 "tempFiles": [ # A set of files the system should be aware of that are used
1828 # for temporary storage. These temporary files will be
1829 # removed on job completion.
1830 # No duplicates are allowed.
1831 # No file patterns are supported.
1832 #
1833 # The supported files are:
1834 #
1835 # Google Cloud Storage:
1836 #
1837 # storage.googleapis.com/{bucket}/{object}
1838 # bucket.storage.googleapis.com/{object}
1839 "A String",
1840 ],
1841 "type": "A String", # The type of Cloud Dataflow job.
1842 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1843 # If this field is set, the service will ensure its uniqueness.
1844 # The request to create a job will fail if the service has knowledge of a
1845 # previously submitted job with the same client's ID and job name.
1846 # The caller may use this field to ensure idempotence of job
1847 # creation across retried attempts to create a job.
1848 # By default, the field is empty and, in that case, the service ignores it.
1849 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1850 # snapshot.
1851 "stepsLocation": "A String", # The GCS location where the steps are stored.
1852 "currentStateTime": "A String", # The timestamp associated with the current state.
1853 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1854 # Flexible resource scheduling jobs are started with some delay after job
1855 # creation, so start_time is unset before start and is updated when the
1856 # job is started by the Cloud Dataflow service. For other jobs, start_time
1857 # always equals to create_time and is immutable and set by the Cloud Dataflow
1858 # service.
1859 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1860 # Cloud Dataflow service.
1861 "requestedState": "A String", # The job's requested state.
1862 #
1863 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1864 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1865 # also be used to directly set a job's requested state to
1866 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1867 # job if it has not already reached a terminal state.
1868 "name": "A String", # The user-specified Cloud Dataflow job name.
1869 #
1870 # Only one Job with a given name may exist in a project at any
1871 # given time. If a caller attempts to create a Job with the same
1872 # name as an already-existing Job, the attempt returns the
1873 # existing Job.
1874 #
1875 # The name must match the regular expression
1876 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1877 "steps": [ # Exactly one of step or steps_location should be specified.
1878 #
1879 # The top-level steps that constitute the entire job.
1880 { # Defines a particular step within a Cloud Dataflow job.
1881 #
1882 # A job consists of multiple steps, each of which performs some
1883 # specific operation as part of the overall job. Data is typically
1884 # passed from one step to another as part of the job.
1885 #
1886 # Here's an example of a sequence of steps which together implement a
1887 # Map-Reduce job:
1888 #
1889 # * Read a collection of data from some source, parsing the
1890 # collection's elements.
1891 #
1892 # * Validate the elements.
1893 #
1894 # * Apply a user-defined function to map each element to some value
1895 # and extract an element-specific key value.
1896 #
1897 # * Group elements with the same key into a single element with
1898 # that key, transforming a multiply-keyed collection into a
1899 # uniquely-keyed collection.
1900 #
1901 # * Write the elements out to some data sink.
1902 #
1903 # Note that the Cloud Dataflow service may be used to run many different
1904 # types of jobs, not just Map-Reduce.
1905 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001906 "name": "A String", # The name that identifies the step. This must be unique for each
1907 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001908 "properties": { # Named properties associated with the step. Each kind of
1909 # predefined step has its own required set of properties.
1910 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1911 "a_key": "", # Properties of the object.
1912 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001913 },
1914 ],
1915 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1916 # of the job it replaced.
1917 #
1918 # When sending a `CreateJobRequest`, you can update a job by specifying it
1919 # here. The job named here is stopped, and its intermediate state is
1920 # transferred to this job.
1921 "currentState": "A String", # The current state of the job.
1922 #
1923 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1924 # specified.
1925 #
1926 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1927 # terminal state. After a job has reached a terminal state, no
1928 # further state updates may be made.
1929 #
1930 # This field may be mutated by the Cloud Dataflow service;
1931 # callers cannot mutate it.
1932 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1933 # isn't contained in the submitted job.
1934 "stages": { # A mapping from each stage to the information about that stage.
1935 "a_key": { # Contains information about how a particular
1936 # google.dataflow.v1beta3.Step will be executed.
1937 "stepName": [ # The steps associated with the execution stage.
1938 # Note that stages may have several steps, and that a given step
1939 # might be run by more than one stage.
1940 "A String",
1941 ],
1942 },
1943 },
1944 },
1945 }</pre>
1946</div>
1947
1948<div class="method">
1949 <code class="details" id="get">get(projectId, jobId, location=None, x__xgafv=None, view=None)</code>
1950 <pre>Gets the state of the specified Cloud Dataflow job.
1951
1952To get the state of a job, we recommend using `projects.locations.jobs.get`
1953with a [regional endpoint]
1954(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1955`projects.jobs.get` is not recommended, as you can only get the state of
1956jobs that are running in `us-central1`.
1957
1958Args:
1959 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
1960 jobId: string, The job ID. (required)
1961 location: string, The [regional endpoint]
1962(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1963contains this job.
1964 x__xgafv: string, V1 error format.
1965 Allowed values
1966 1 - v1 error format
1967 2 - v2 error format
1968 view: string, The level of information requested in response.
1969
1970Returns:
1971 An object of the form:
1972
1973 { # Defines a job to be run by the Cloud Dataflow service.
1974 "labels": { # User-defined labels for this job.
1975 #
1976 # The labels map can contain no more than 64 entries. Entries of the labels
1977 # map are UTF8 strings that comply with the following restrictions:
1978 #
1979 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1980 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07001981 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001982 # size.
1983 "a_key": "A String",
1984 },
1985 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1986 # by the metadata values provided here. Populated for ListJobs and all GetJob
1987 # views SUMMARY and higher.
1988 # ListJob response and Job SUMMARY view.
1989 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1990 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1991 "version": "A String", # The version of the SDK used to run the job.
1992 "sdkSupportStatus": "A String", # The support status for this SDK version.
1993 },
1994 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1995 { # Metadata for a PubSub connector used by the job.
1996 "topic": "A String", # Topic accessed in the connection.
1997 "subscription": "A String", # Subscription used in the connection.
1998 },
1999 ],
2000 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2001 { # Metadata for a Datastore connector used by the job.
2002 "projectId": "A String", # ProjectId accessed in the connection.
2003 "namespace": "A String", # Namespace used in the connection.
2004 },
2005 ],
2006 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2007 { # Metadata for a File connector used by the job.
2008 "filePattern": "A String", # File Pattern used to access files by the connector.
2009 },
2010 ],
2011 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2012 { # Metadata for a Spanner connector used by the job.
2013 "instanceId": "A String", # InstanceId accessed in the connection.
2014 "projectId": "A String", # ProjectId accessed in the connection.
2015 "databaseId": "A String", # DatabaseId accessed in the connection.
2016 },
2017 ],
2018 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2019 { # Metadata for a BigTable connector used by the job.
2020 "instanceId": "A String", # InstanceId accessed in the connection.
2021 "projectId": "A String", # ProjectId accessed in the connection.
2022 "tableId": "A String", # TableId accessed in the connection.
2023 },
2024 ],
2025 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2026 { # Metadata for a BigQuery connector used by the job.
2027 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002028 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07002029 "table": "A String", # Table accessed in the connection.
2030 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002031 },
2032 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07002033 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002034 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2035 # A description of the user pipeline and stages through which it is executed.
2036 # Created by Cloud Dataflow service. Only retrieved with
2037 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2038 # form. This data is provided by the Dataflow service for ease of visualizing
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002039 # the pipeline and interpreting Dataflow provided metrics.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002040 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2041 { # Description of the type, names/ids, and input/outputs for a transform.
2042 "kind": "A String", # Type of transform.
2043 "name": "A String", # User provided name for this transform instance.
2044 "inputCollectionName": [ # User names for all collection inputs to this transform.
2045 "A String",
2046 ],
2047 "displayData": [ # Transform-specific display data.
2048 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002049 "key": "A String", # The key identifying the display data.
2050 # This is intended to be used as a label for the display data
2051 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002052 "shortStrValue": "A String", # A possible additional shorter value to display.
2053 # For example a java_class_name_value of com.mypackage.MyDoFn
2054 # will be stored with MyDoFn as the short_str_value and
2055 # com.mypackage.MyDoFn as the java_class_name value.
2056 # short_str_value can be displayed and java_class_name_value
2057 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002058 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002059 "url": "A String", # An optional full URL.
2060 "floatValue": 3.14, # Contains value if the data is of float type.
2061 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2062 # language namespace (i.e. python module) which defines the display data.
2063 # This allows a dax monitoring system to specially handle the data
2064 # and perform custom rendering.
2065 "javaClassValue": "A String", # Contains value if the data is of java class type.
2066 "label": "A String", # An optional label to display in a dax UI for the element.
2067 "boolValue": True or False, # Contains value if the data is of a boolean type.
2068 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002069 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002070 "int64Value": "A String", # Contains value if the data is of int64 type.
2071 },
2072 ],
2073 "outputCollectionName": [ # User names for all collection outputs to this transform.
2074 "A String",
2075 ],
2076 "id": "A String", # SDK generated id of this transform instance.
2077 },
2078 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002079 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2080 { # Description of the composing transforms, names/ids, and input/outputs of a
2081 # stage of execution. Some composing transforms and sources may have been
2082 # generated by the Dataflow service during execution planning.
2083 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2084 { # Description of an interstitial value between transforms in an execution
2085 # stage.
2086 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2087 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2088 # source is most closely associated.
2089 "name": "A String", # Dataflow service generated name for this source.
2090 },
2091 ],
2092 "kind": "A String", # Type of tranform this stage is executing.
2093 "name": "A String", # Dataflow service generated name for this stage.
2094 "outputSource": [ # Output sources for this stage.
2095 { # Description of an input or output of an execution stage.
2096 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2097 "sizeBytes": "A String", # Size of the source, if measurable.
2098 "name": "A String", # Dataflow service generated name for this source.
2099 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2100 # source is most closely associated.
2101 },
2102 ],
2103 "inputSource": [ # Input sources for this stage.
2104 { # Description of an input or output of an execution stage.
2105 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2106 "sizeBytes": "A String", # Size of the source, if measurable.
2107 "name": "A String", # Dataflow service generated name for this source.
2108 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2109 # source is most closely associated.
2110 },
2111 ],
2112 "componentTransform": [ # Transforms that comprise this execution stage.
2113 { # Description of a transform executed as part of an execution stage.
2114 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2115 "originalTransform": "A String", # User name for the original user transform with which this transform is
2116 # most closely associated.
2117 "name": "A String", # Dataflow service generated name for this source.
2118 },
2119 ],
2120 "id": "A String", # Dataflow service generated id for this stage.
2121 },
2122 ],
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002123 "displayData": [ # Pipeline level display data.
2124 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002125 "key": "A String", # The key identifying the display data.
2126 # This is intended to be used as a label for the display data
2127 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002128 "shortStrValue": "A String", # A possible additional shorter value to display.
2129 # For example a java_class_name_value of com.mypackage.MyDoFn
2130 # will be stored with MyDoFn as the short_str_value and
2131 # com.mypackage.MyDoFn as the java_class_name value.
2132 # short_str_value can be displayed and java_class_name_value
2133 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002134 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002135 "url": "A String", # An optional full URL.
2136 "floatValue": 3.14, # Contains value if the data is of float type.
2137 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2138 # language namespace (i.e. python module) which defines the display data.
2139 # This allows a dax monitoring system to specially handle the data
2140 # and perform custom rendering.
2141 "javaClassValue": "A String", # Contains value if the data is of java class type.
2142 "label": "A String", # An optional label to display in a dax UI for the element.
2143 "boolValue": True or False, # Contains value if the data is of a boolean type.
2144 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002145 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002146 "int64Value": "A String", # Contains value if the data is of int64 type.
2147 },
2148 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002149 },
2150 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2151 # callers cannot mutate it.
2152 { # A message describing the state of a particular execution stage.
2153 "executionStageName": "A String", # The name of the execution stage.
2154 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2155 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
2156 },
2157 ],
2158 "id": "A String", # The unique ID of this job.
2159 #
2160 # This field is set by the Cloud Dataflow service when the Job is
2161 # created, and is immutable for the life of the job.
2162 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2163 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2164 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2165 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2166 # corresponding name prefixes of the new job.
2167 "a_key": "A String",
2168 },
2169 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002170 "workerRegion": "A String", # The Compute Engine region
2171 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2172 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
2173 # with worker_zone. If neither worker_region nor worker_zone is specified,
2174 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002175 "version": { # A structure describing which components and their versions of the service
2176 # are required in order to run the job.
2177 "a_key": "", # Properties of the object.
2178 },
2179 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2180 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2181 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2182 #
2183 # Format:
2184 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2185 "internalExperiments": { # Experimental settings.
2186 "a_key": "", # Properties of the object. Contains field @type with type URL.
2187 },
2188 "dataset": "A String", # The dataset for the current project where various workflow
2189 # related tables are stored.
2190 #
2191 # The supported resource type is:
2192 #
2193 # Google BigQuery:
2194 # bigquery.googleapis.com/{dataset}
2195 "experiments": [ # The list of experiments to enable.
2196 "A String",
2197 ],
2198 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2199 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2200 # options are passed through the service and are used to recreate the
2201 # SDK pipeline options on the worker in a language agnostic and platform
2202 # independent way.
2203 "a_key": "", # Properties of the object.
2204 },
2205 "userAgent": { # A description of the process that generated the request.
2206 "a_key": "", # Properties of the object.
2207 },
Dan O'Mearadd494642020-05-01 07:42:23 -07002208 "workerZone": "A String", # The Compute Engine zone
2209 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2210 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
2211 # with worker_region. If neither worker_region nor worker_zone is specified,
2212 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002213 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2214 # specified in order for the job to have workers.
2215 { # Describes one particular pool of Cloud Dataflow workers to be
2216 # instantiated by the Cloud Dataflow service in order to perform the
2217 # computations required by a job. Note that a workflow job may use
2218 # multiple pools, in order to match the various computational
2219 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002220 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2221 # harness, residing in Google Container Registry.
2222 #
2223 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2224 "ipConfiguration": "A String", # Configuration for VM IPs.
2225 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2226 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2227 "algorithm": "A String", # The algorithm to use for autoscaling.
2228 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002229 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07002230 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2231 # the service will use the network "default".
2232 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2233 # will attempt to choose a reasonable default.
2234 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2235 "a_key": "A String",
2236 },
2237 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2238 # service will attempt to choose a reasonable default.
2239 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2240 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002241 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2242 # using the standard Dataflow task runner. Users should ignore
2243 # this field.
2244 "workflowFileName": "A String", # The file to store the workflow in.
2245 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2246 # will not be uploaded.
2247 #
2248 # The supported resource type is:
2249 #
2250 # Google Cloud Storage:
2251 # storage.googleapis.com/{bucket}/{object}
2252 # bucket.storage.googleapis.com/{object}
2253 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07002254 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2255 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2256 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2257 "vmId": "A String", # The ID string of the VM.
2258 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2259 # taskrunner; e.g. "wheel".
2260 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2261 # taskrunner; e.g. "root".
2262 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2263 # access the Cloud Dataflow API.
2264 "A String",
2265 ],
2266 "languageHint": "A String", # The suggested backend language.
2267 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2268 # console.
2269 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2270 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002271 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2272 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2273 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2274 # "shuffle/v1beta1".
2275 "workerId": "A String", # The ID of the worker running this pipeline.
2276 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
2277 #
2278 # When workers access Google Cloud APIs, they logically do so via
2279 # relative URLs. If this field is specified, it supplies the base
2280 # URL to use for resolving these relative URLs. The normative
2281 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2282 # Locators".
2283 #
2284 # If not specified, the default value is "http://www.googleapis.com/"
2285 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
2286 # "dataflow/v1b3/projects".
2287 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2288 # storage.
2289 #
2290 # The supported resource type is:
2291 #
2292 # Google Cloud Storage:
2293 #
2294 # storage.googleapis.com/{bucket}/{object}
2295 # bucket.storage.googleapis.com/{object}
2296 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002297 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
2298 "harnessCommand": "A String", # The command to launch the worker harness.
2299 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
2300 # temporary storage.
2301 #
2302 # The supported resource type is:
2303 #
2304 # Google Cloud Storage:
2305 # storage.googleapis.com/{bucket}/{object}
2306 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07002307 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
2308 #
2309 # When workers access Google Cloud APIs, they logically do so via
2310 # relative URLs. If this field is specified, it supplies the base
2311 # URL to use for resolving these relative URLs. The normative
2312 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2313 # Locators".
2314 #
2315 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002316 },
Dan O'Mearadd494642020-05-01 07:42:23 -07002317 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
2318 # service will choose a number of threads (according to the number of cores
2319 # on the selected machine type for batch, or 1 by convention for streaming).
2320 "poolArgs": { # Extra arguments for this worker pool.
2321 "a_key": "", # Properties of the object. Contains field @type with type URL.
2322 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002323 "packages": [ # Packages to be installed on workers.
2324 { # The packages that must be installed in order for a worker to run the
2325 # steps of the Cloud Dataflow job that will be assigned to its worker
2326 # pool.
2327 #
2328 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2329 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2330 # might use this to install jars containing the user's code and all of the
2331 # various dependencies (libraries, data files, etc.) required in order
2332 # for that code to run.
2333 "location": "A String", # The resource to read the package from. The supported resource type is:
2334 #
2335 # Google Cloud Storage:
2336 #
2337 # storage.googleapis.com/{bucket}
2338 # bucket.storage.googleapis.com/
2339 "name": "A String", # The name of the package.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002340 },
2341 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07002342 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
2343 # select a default set of packages which are useful to worker
2344 # harnesses written in a particular language.
2345 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
2346 # are supported.
2347 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002348 # attempt to choose a reasonable default.
2349 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
2350 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2351 # `TEARDOWN_NEVER`.
2352 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2353 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2354 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2355 # down.
2356 #
2357 # If the workers are not torn down by the service, they will
2358 # continue to run and use Google Compute Engine VM resources in the
2359 # user's project until they are explicitly terminated by the user.
2360 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2361 # policy except for small, manually supervised test jobs.
2362 #
2363 # If unknown or unspecified, the service will attempt to choose a reasonable
2364 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07002365 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2366 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002367 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
2368 # execute the job. If zero or unspecified, the service will
2369 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002370 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2371 # the form "regions/REGION/subnetworks/SUBNETWORK".
2372 "dataDisks": [ # Data disks that are used by a VM in this workflow.
2373 { # Describes the data disk used by a workflow job.
2374 "mountPoint": "A String", # Directory in a VM where disk is mounted.
2375 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
2376 # attempt to choose a reasonable default.
2377 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
2378 # must be a disk type appropriate to the project and zone in which
2379 # the workers will run. If unknown or unspecified, the service
2380 # will attempt to choose a reasonable default.
2381 #
2382 # For example, the standard persistent disk type is a resource name
2383 # typically ending in "pd-standard". If SSD persistent disks are
2384 # available, the resource name typically ends with "pd-ssd". The
2385 # actual valid values are defined the Google Compute Engine API,
2386 # not by the Cloud Dataflow API; consult the Google Compute Engine
2387 # documentation for more information about determining the set of
2388 # available disk types for a particular project and zone.
2389 #
2390 # Google Compute Engine Disk types are local to a particular
2391 # project in a particular zone, and so the resource name will
2392 # typically look something like this:
2393 #
2394 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002395 },
2396 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07002397 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
2398 # only be set in the Fn API path. For non-cross-language pipelines this
2399 # should have only one entry. Cross-language pipelines will have two or more
2400 # entries.
2401 { # Defines a SDK harness container for executing Dataflow pipelines.
2402 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
2403 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
2404 # container instance with this image. If false (or unset) recommends using
2405 # more than one core per SDK container instance with this image for
2406 # efficiency. Note that Dataflow service may choose to override this property
2407 # if needed.
2408 },
2409 ],
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002410 },
2411 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07002412 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
2413 # unspecified, the service will attempt to choose a reasonable
2414 # default. This should be in the form of the API service name,
2415 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002416 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2417 # storage. The system will append the suffix "/temp-{JOBNAME} to
2418 # this resource prefix, where {JOBNAME} is the value of the
2419 # job_name field. The resulting bucket and object prefix is used
2420 # as the prefix of the resources used to store temporary data
2421 # needed during the job execution. NOTE: This will override the
2422 # value in taskrunner_settings.
2423 # The supported resource type is:
2424 #
2425 # Google Cloud Storage:
2426 #
2427 # storage.googleapis.com/{bucket}/{object}
2428 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002429 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002430 "location": "A String", # The [regional endpoint]
2431 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2432 # contains this job.
2433 "tempFiles": [ # A set of files the system should be aware of that are used
2434 # for temporary storage. These temporary files will be
2435 # removed on job completion.
2436 # No duplicates are allowed.
2437 # No file patterns are supported.
2438 #
2439 # The supported files are:
2440 #
2441 # Google Cloud Storage:
2442 #
2443 # storage.googleapis.com/{bucket}/{object}
2444 # bucket.storage.googleapis.com/{object}
2445 "A String",
2446 ],
2447 "type": "A String", # The type of Cloud Dataflow job.
2448 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
2449 # If this field is set, the service will ensure its uniqueness.
2450 # The request to create a job will fail if the service has knowledge of a
2451 # previously submitted job with the same client's ID and job name.
2452 # The caller may use this field to ensure idempotence of job
2453 # creation across retried attempts to create a job.
2454 # By default, the field is empty and, in that case, the service ignores it.
2455 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
2456 # snapshot.
2457 "stepsLocation": "A String", # The GCS location where the steps are stored.
2458 "currentStateTime": "A String", # The timestamp associated with the current state.
2459 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
2460 # Flexible resource scheduling jobs are started with some delay after job
2461 # creation, so start_time is unset before start and is updated when the
2462 # job is started by the Cloud Dataflow service. For other jobs, start_time
2463 # always equals to create_time and is immutable and set by the Cloud Dataflow
2464 # service.
2465 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
2466 # Cloud Dataflow service.
2467 "requestedState": "A String", # The job's requested state.
2468 #
2469 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
2470 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
2471 # also be used to directly set a job's requested state to
2472 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
2473 # job if it has not already reached a terminal state.
2474 "name": "A String", # The user-specified Cloud Dataflow job name.
2475 #
2476 # Only one Job with a given name may exist in a project at any
2477 # given time. If a caller attempts to create a Job with the same
2478 # name as an already-existing Job, the attempt returns the
2479 # existing Job.
2480 #
2481 # The name must match the regular expression
2482 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
2483 "steps": [ # Exactly one of step or steps_location should be specified.
2484 #
2485 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002486 { # Defines a particular step within a Cloud Dataflow job.
2487 #
2488 # A job consists of multiple steps, each of which performs some
2489 # specific operation as part of the overall job. Data is typically
2490 # passed from one step to another as part of the job.
2491 #
2492 # Here's an example of a sequence of steps which together implement a
2493 # Map-Reduce job:
2494 #
2495 # * Read a collection of data from some source, parsing the
2496 # collection's elements.
2497 #
2498 # * Validate the elements.
2499 #
2500 # * Apply a user-defined function to map each element to some value
2501 # and extract an element-specific key value.
2502 #
2503 # * Group elements with the same key into a single element with
2504 # that key, transforming a multiply-keyed collection into a
2505 # uniquely-keyed collection.
2506 #
2507 # * Write the elements out to some data sink.
2508 #
2509 # Note that the Cloud Dataflow service may be used to run many different
2510 # types of jobs, not just Map-Reduce.
2511 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002512 "name": "A String", # The name that identifies the step. This must be unique for each
2513 # step with respect to all other steps in the Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002514 "properties": { # Named properties associated with the step. Each kind of
2515 # predefined step has its own required set of properties.
2516 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Takashi Matsuo06694102015-09-11 13:55:40 -07002517 "a_key": "", # Properties of the object.
2518 },
2519 },
2520 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07002521 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
2522 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002523 #
Thomas Coffee2f245372017-03-27 10:39:26 -07002524 # When sending a `CreateJobRequest`, you can update a job by specifying it
2525 # here. The job named here is stopped, and its intermediate state is
2526 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002527 "currentState": "A String", # The current state of the job.
2528 #
2529 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
2530 # specified.
2531 #
2532 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
2533 # terminal state. After a job has reached a terminal state, no
2534 # further state updates may be made.
2535 #
2536 # This field may be mutated by the Cloud Dataflow service;
2537 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002538 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
2539 # isn't contained in the submitted job.
Takashi Matsuo06694102015-09-11 13:55:40 -07002540 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002541 "a_key": { # Contains information about how a particular
2542 # google.dataflow.v1beta3.Step will be executed.
2543 "stepName": [ # The steps associated with the execution stage.
2544 # Note that stages may have several steps, and that a given step
2545 # might be run by more than one stage.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002546 "A String",
2547 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002548 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002549 },
2550 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002551 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002552</div>
2553
2554<div class="method">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002555 <code class="details" id="getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</code>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002556 <pre>Request the job status.
2557
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002558To request the status of a job, we recommend using
2559`projects.locations.jobs.getMetrics` with a [regional endpoint]
2560(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
2561`projects.jobs.getMetrics` is not recommended, as you can only request the
2562status of jobs that are running in `us-central1`.
2563
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002564Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002565 projectId: string, A project id. (required)
2566 jobId: string, The job to get messages for. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002567 startTime: string, Return only metric data that has changed since this time.
2568Default is to return all information about all metrics for the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002569 location: string, The [regional endpoint]
2570(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2571contains the job specified by job_id.
Takashi Matsuo06694102015-09-11 13:55:40 -07002572 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002573 Allowed values
2574 1 - v1 error format
2575 2 - v2 error format
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002576
2577Returns:
2578 An object of the form:
2579
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002580 { # JobMetrics contains a collection of metrics describing the detailed progress
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002581 # of a Dataflow job. Metrics correspond to user-defined and system-defined
2582 # metrics in the job.
2583 #
2584 # This resource captures only the most recent values of each metric;
2585 # time-series data can be queried for them (under the same metric names)
2586 # from Cloud Monitoring.
Takashi Matsuo06694102015-09-11 13:55:40 -07002587 "metrics": [ # All metrics for this job.
2588 { # Describes the state of a metric.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002589 "meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
2590 # This holds the count of the aggregated values and is used in combination
2591 # with mean_sum above to obtain the actual mean aggregate value.
2592 # The only possible value type is Long.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002593 "kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are
2594 # "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".
2595 # The specified aggregation kind is case-insensitive.
2596 #
2597 # If omitted, this is not an aggregated value but instead
2598 # a single metric sample value.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002599 "set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only
2600 # possible value type is a list of Values whose type can be Long, Double,
2601 # or String, according to the metric's type. All Values in the list must
2602 # be of the same type.
2603 "name": { # Identifies a metric, by describing the source which generated the # Name of the metric.
2604 # metric.
2605 "origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;
2606 # will be "dataflow" for metrics defined by the Dataflow service or SDK.
Takashi Matsuo06694102015-09-11 13:55:40 -07002607 "name": "A String", # Worker-defined metric name.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002608 "context": { # Zero or more labeled fields which identify the part of the job this
2609 # metric is associated with, such as the name of a step or collection.
2610 #
2611 # For example, built-in counters associated with steps will have
Dan O'Mearadd494642020-05-01 07:42:23 -07002612 # context['step'] = &lt;step-name&gt;. Counters associated with PCollections
2613 # in the SDK will have context['pcollection'] = &lt;pcollection-name&gt;.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002614 "a_key": "A String",
2615 },
2616 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002617 "meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
2618 # This holds the sum of the aggregated values and is used in combination
2619 # with mean_count below to obtain the actual mean aggregate value.
2620 # The only possible value types are Long and Double.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002621 "cumulative": True or False, # True if this metric is reported as the total cumulative aggregate
2622 # value accumulated since the worker started working on this WorkItem.
2623 # By default this is false, indicating that this metric is reported
2624 # as a delta that is not associated with any WorkItem.
2625 "updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are
2626 # reporting work progress; it will be filled in responses from the
2627 # metrics API.
2628 "scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",
2629 # "And", and "Or". The possible value types are Long, Double, and Boolean.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002630 "internal": "", # Worker-computed aggregate value for internal use by the Dataflow
2631 # service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002632 "gauge": "", # A struct value describing properties of a Gauge.
2633 # Metrics of gauge type show the value of a metric across time, and is
2634 # aggregated based on the newest value.
2635 "distribution": "", # A struct value describing properties of a distribution of numeric values.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002636 },
2637 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07002638 "metricTime": "A String", # Timestamp as of which metric values are current.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002639 }</pre>
2640</div>
2641
2642<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002643 <code class="details" id="list">list(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002644 <pre>List the jobs of a project.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002645
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002646To list the jobs of a project in a region, we recommend using
2647`projects.locations.jobs.get` with a [regional endpoint]
2648(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
2649list the all jobs across all regions, use `projects.jobs.aggregated`. Using
2650`projects.jobs.list` is not recommended, as you can only get the list of
2651jobs that are running in `us-central1`.
2652
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002653Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002654 projectId: string, The project which owns the jobs. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002655 pageSize: integer, If there are many jobs, limit response to at most this many.
2656The actual number of jobs returned will be the lesser of max_responses
2657and an unspecified server-defined limit.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002658 pageToken: string, Set this to the 'next_page_token' field of a previous response
2659to request additional results in a long list.
Takashi Matsuo06694102015-09-11 13:55:40 -07002660 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002661 Allowed values
2662 1 - v1 error format
2663 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002664 location: string, The [regional endpoint]
2665(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2666contains this job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002667 filter: string, The kind of filter to use.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002668 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002669
2670Returns:
2671 An object of the form:
2672
Dan O'Mearadd494642020-05-01 07:42:23 -07002673 { # Response to a request to list Cloud Dataflow jobs in a project. This might
2674 # be a partial response, depending on the page size in the ListJobsRequest.
2675 # However, if the project does not have any jobs, an instance of
2676 # ListJobsResponse is not returned and the requests's response
2677 # body is empty {}.
Takashi Matsuo06694102015-09-11 13:55:40 -07002678 "nextPageToken": "A String", # Set if there may be more results than fit in this response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002679 "failedLocation": [ # Zero or more messages describing the [regional endpoints]
2680 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2681 # failed to respond.
2682 { # Indicates which [regional endpoint]
2683 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
2684 # to respond to a request for data.
2685 "name": "A String", # The name of the [regional endpoint]
2686 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2687 # failed to respond.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002688 },
2689 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07002690 "jobs": [ # A subset of the requested job information.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002691 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002692 "labels": { # User-defined labels for this job.
2693 #
2694 # The labels map can contain no more than 64 entries. Entries of the labels
2695 # map are UTF8 strings that comply with the following restrictions:
2696 #
2697 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2698 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07002699 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002700 # size.
2701 "a_key": "A String",
2702 },
2703 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2704 # by the metadata values provided here. Populated for ListJobs and all GetJob
2705 # views SUMMARY and higher.
2706 # ListJob response and Job SUMMARY view.
2707 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
2708 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
2709 "version": "A String", # The version of the SDK used to run the job.
2710 "sdkSupportStatus": "A String", # The support status for this SDK version.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07002711 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002712 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
2713 { # Metadata for a PubSub connector used by the job.
2714 "topic": "A String", # Topic accessed in the connection.
2715 "subscription": "A String", # Subscription used in the connection.
2716 },
2717 ],
2718 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2719 { # Metadata for a Datastore connector used by the job.
2720 "projectId": "A String", # ProjectId accessed in the connection.
2721 "namespace": "A String", # Namespace used in the connection.
2722 },
2723 ],
2724 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2725 { # Metadata for a File connector used by the job.
2726 "filePattern": "A String", # File Pattern used to access files by the connector.
2727 },
2728 ],
2729 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2730 { # Metadata for a Spanner connector used by the job.
2731 "instanceId": "A String", # InstanceId accessed in the connection.
2732 "projectId": "A String", # ProjectId accessed in the connection.
2733 "databaseId": "A String", # DatabaseId accessed in the connection.
2734 },
2735 ],
2736 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2737 { # Metadata for a BigTable connector used by the job.
2738 "instanceId": "A String", # InstanceId accessed in the connection.
2739 "projectId": "A String", # ProjectId accessed in the connection.
2740 "tableId": "A String", # TableId accessed in the connection.
2741 },
2742 ],
2743 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2744 { # Metadata for a BigQuery connector used by the job.
2745 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002746 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07002747 "table": "A String", # Table accessed in the connection.
2748 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002749 },
2750 ],
2751 },
2752 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2753 # A description of the user pipeline and stages through which it is executed.
2754 # Created by Cloud Dataflow service. Only retrieved with
2755 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2756 # form. This data is provided by the Dataflow service for ease of visualizing
2757 # the pipeline and interpreting Dataflow provided metrics.
2758 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2759 { # Description of the type, names/ids, and input/outputs for a transform.
2760 "kind": "A String", # Type of transform.
2761 "name": "A String", # User provided name for this transform instance.
2762 "inputCollectionName": [ # User names for all collection inputs to this transform.
2763 "A String",
2764 ],
2765 "displayData": [ # Transform-specific display data.
2766 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002767 "key": "A String", # The key identifying the display data.
2768 # This is intended to be used as a label for the display data
2769 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002770 "shortStrValue": "A String", # A possible additional shorter value to display.
2771 # For example a java_class_name_value of com.mypackage.MyDoFn
2772 # will be stored with MyDoFn as the short_str_value and
2773 # com.mypackage.MyDoFn as the java_class_name value.
2774 # short_str_value can be displayed and java_class_name_value
2775 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002776 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002777 "url": "A String", # An optional full URL.
2778 "floatValue": 3.14, # Contains value if the data is of float type.
2779 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2780 # language namespace (i.e. python module) which defines the display data.
2781 # This allows a dax monitoring system to specially handle the data
2782 # and perform custom rendering.
2783 "javaClassValue": "A String", # Contains value if the data is of java class type.
2784 "label": "A String", # An optional label to display in a dax UI for the element.
2785 "boolValue": True or False, # Contains value if the data is of a boolean type.
2786 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002787 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002788 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002789 },
2790 ],
2791 "outputCollectionName": [ # User names for all collection outputs to this transform.
2792 "A String",
2793 ],
2794 "id": "A String", # SDK generated id of this transform instance.
2795 },
2796 ],
2797 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2798 { # Description of the composing transforms, names/ids, and input/outputs of a
2799 # stage of execution. Some composing transforms and sources may have been
2800 # generated by the Dataflow service during execution planning.
2801 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2802 { # Description of an interstitial value between transforms in an execution
2803 # stage.
2804 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2805 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2806 # source is most closely associated.
2807 "name": "A String", # Dataflow service generated name for this source.
2808 },
2809 ],
2810 "kind": "A String", # Type of tranform this stage is executing.
2811 "name": "A String", # Dataflow service generated name for this stage.
2812 "outputSource": [ # Output sources for this stage.
2813 { # Description of an input or output of an execution stage.
2814 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2815 "sizeBytes": "A String", # Size of the source, if measurable.
2816 "name": "A String", # Dataflow service generated name for this source.
2817 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2818 # source is most closely associated.
2819 },
2820 ],
2821 "inputSource": [ # Input sources for this stage.
2822 { # Description of an input or output of an execution stage.
2823 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2824 "sizeBytes": "A String", # Size of the source, if measurable.
2825 "name": "A String", # Dataflow service generated name for this source.
2826 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2827 # source is most closely associated.
2828 },
2829 ],
2830 "componentTransform": [ # Transforms that comprise this execution stage.
2831 { # Description of a transform executed as part of an execution stage.
2832 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2833 "originalTransform": "A String", # User name for the original user transform with which this transform is
2834 # most closely associated.
2835 "name": "A String", # Dataflow service generated name for this source.
2836 },
2837 ],
2838 "id": "A String", # Dataflow service generated id for this stage.
2839 },
2840 ],
2841 "displayData": [ # Pipeline level display data.
2842 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002843 "key": "A String", # The key identifying the display data.
2844 # This is intended to be used as a label for the display data
2845 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002846 "shortStrValue": "A String", # A possible additional shorter value to display.
2847 # For example a java_class_name_value of com.mypackage.MyDoFn
2848 # will be stored with MyDoFn as the short_str_value and
2849 # com.mypackage.MyDoFn as the java_class_name value.
2850 # short_str_value can be displayed and java_class_name_value
2851 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002852 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002853 "url": "A String", # An optional full URL.
2854 "floatValue": 3.14, # Contains value if the data is of float type.
2855 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2856 # language namespace (i.e. python module) which defines the display data.
2857 # This allows a dax monitoring system to specially handle the data
2858 # and perform custom rendering.
2859 "javaClassValue": "A String", # Contains value if the data is of java class type.
2860 "label": "A String", # An optional label to display in a dax UI for the element.
2861 "boolValue": True or False, # Contains value if the data is of a boolean type.
2862 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002863 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002864 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002865 },
2866 ],
2867 },
2868 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2869 # callers cannot mutate it.
2870 { # A message describing the state of a particular execution stage.
2871 "executionStageName": "A String", # The name of the execution stage.
2872 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2873 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002874 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002875 ],
2876 "id": "A String", # The unique ID of this job.
2877 #
2878 # This field is set by the Cloud Dataflow service when the Job is
2879 # created, and is immutable for the life of the job.
2880 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2881 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2882 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2883 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2884 # corresponding name prefixes of the new job.
2885 "a_key": "A String",
2886 },
2887 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002888 "workerRegion": "A String", # The Compute Engine region
2889 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2890 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
2891 # with worker_zone. If neither worker_region nor worker_zone is specified,
2892 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002893 "version": { # A structure describing which components and their versions of the service
2894 # are required in order to run the job.
2895 "a_key": "", # Properties of the object.
2896 },
2897 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2898 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2899 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002900 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002901 # Format:
2902 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2903 "internalExperiments": { # Experimental settings.
2904 "a_key": "", # Properties of the object. Contains field @type with type URL.
2905 },
2906 "dataset": "A String", # The dataset for the current project where various workflow
2907 # related tables are stored.
2908 #
2909 # The supported resource type is:
2910 #
2911 # Google BigQuery:
2912 # bigquery.googleapis.com/{dataset}
2913 "experiments": [ # The list of experiments to enable.
2914 "A String",
2915 ],
2916 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2917 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2918 # options are passed through the service and are used to recreate the
2919 # SDK pipeline options on the worker in a language agnostic and platform
2920 # independent way.
2921 "a_key": "", # Properties of the object.
2922 },
2923 "userAgent": { # A description of the process that generated the request.
2924 "a_key": "", # Properties of the object.
2925 },
Dan O'Mearadd494642020-05-01 07:42:23 -07002926 "workerZone": "A String", # The Compute Engine zone
2927 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2928 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
2929 # with worker_region. If neither worker_region nor worker_zone is specified,
2930 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002931 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2932 # specified in order for the job to have workers.
2933 { # Describes one particular pool of Cloud Dataflow workers to be
2934 # instantiated by the Cloud Dataflow service in order to perform the
2935 # computations required by a job. Note that a workflow job may use
2936 # multiple pools, in order to match the various computational
2937 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002938 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2939 # harness, residing in Google Container Registry.
2940 #
2941 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2942 "ipConfiguration": "A String", # Configuration for VM IPs.
2943 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2944 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2945 "algorithm": "A String", # The algorithm to use for autoscaling.
2946 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002947 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07002948 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2949 # the service will use the network "default".
2950 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2951 # will attempt to choose a reasonable default.
2952 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2953 "a_key": "A String",
2954 },
2955 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2956 # service will attempt to choose a reasonable default.
2957 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2958 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002959 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2960 # using the standard Dataflow task runner. Users should ignore
2961 # this field.
2962 "workflowFileName": "A String", # The file to store the workflow in.
2963 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2964 # will not be uploaded.
2965 #
2966 # The supported resource type is:
2967 #
2968 # Google Cloud Storage:
2969 # storage.googleapis.com/{bucket}/{object}
2970 # bucket.storage.googleapis.com/{object}
2971 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07002972 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2973 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2974 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2975 "vmId": "A String", # The ID string of the VM.
2976 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2977 # taskrunner; e.g. "wheel".
2978 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2979 # taskrunner; e.g. "root".
2980 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2981 # access the Cloud Dataflow API.
2982 "A String",
2983 ],
2984 "languageHint": "A String", # The suggested backend language.
2985 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2986 # console.
2987 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2988 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002989 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2990 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2991 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2992 # "shuffle/v1beta1".
2993 "workerId": "A String", # The ID of the worker running this pipeline.
2994 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002995 #
2996 # When workers access Google Cloud APIs, they logically do so via
2997 # relative URLs. If this field is specified, it supplies the base
2998 # URL to use for resolving these relative URLs. The normative
2999 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3000 # Locators".
3001 #
3002 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003003 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
3004 # "dataflow/v1b3/projects".
3005 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3006 # storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003007 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003008 # The supported resource type is:
3009 #
3010 # Google Cloud Storage:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003011 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003012 # storage.googleapis.com/{bucket}/{object}
3013 # bucket.storage.googleapis.com/{object}
Takashi Matsuo06694102015-09-11 13:55:40 -07003014 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003015 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
3016 "harnessCommand": "A String", # The command to launch the worker harness.
3017 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
3018 # temporary storage.
3019 #
3020 # The supported resource type is:
3021 #
3022 # Google Cloud Storage:
3023 # storage.googleapis.com/{bucket}/{object}
3024 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07003025 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3026 #
3027 # When workers access Google Cloud APIs, they logically do so via
3028 # relative URLs. If this field is specified, it supplies the base
3029 # URL to use for resolving these relative URLs. The normative
3030 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3031 # Locators".
3032 #
3033 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003034 },
Dan O'Mearadd494642020-05-01 07:42:23 -07003035 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
3036 # service will choose a number of threads (according to the number of cores
3037 # on the selected machine type for batch, or 1 by convention for streaming).
3038 "poolArgs": { # Extra arguments for this worker pool.
3039 "a_key": "", # Properties of the object. Contains field @type with type URL.
3040 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003041 "packages": [ # Packages to be installed on workers.
3042 { # The packages that must be installed in order for a worker to run the
3043 # steps of the Cloud Dataflow job that will be assigned to its worker
3044 # pool.
3045 #
3046 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3047 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3048 # might use this to install jars containing the user's code and all of the
3049 # various dependencies (libraries, data files, etc.) required in order
3050 # for that code to run.
3051 "location": "A String", # The resource to read the package from. The supported resource type is:
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003052 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003053 # Google Cloud Storage:
3054 #
3055 # storage.googleapis.com/{bucket}
3056 # bucket.storage.googleapis.com/
3057 "name": "A String", # The name of the package.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003058 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003059 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003060 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3061 # select a default set of packages which are useful to worker
3062 # harnesses written in a particular language.
3063 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
3064 # are supported.
3065 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003066 # attempt to choose a reasonable default.
3067 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
3068 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3069 # `TEARDOWN_NEVER`.
3070 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3071 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3072 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3073 # down.
3074 #
3075 # If the workers are not torn down by the service, they will
3076 # continue to run and use Google Compute Engine VM resources in the
3077 # user's project until they are explicitly terminated by the user.
3078 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3079 # policy except for small, manually supervised test jobs.
3080 #
3081 # If unknown or unspecified, the service will attempt to choose a reasonable
3082 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07003083 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3084 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003085 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
3086 # execute the job. If zero or unspecified, the service will
3087 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003088 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3089 # the form "regions/REGION/subnetworks/SUBNETWORK".
3090 "dataDisks": [ # Data disks that are used by a VM in this workflow.
3091 { # Describes the data disk used by a workflow job.
3092 "mountPoint": "A String", # Directory in a VM where disk is mounted.
3093 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
3094 # attempt to choose a reasonable default.
3095 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
3096 # must be a disk type appropriate to the project and zone in which
3097 # the workers will run. If unknown or unspecified, the service
3098 # will attempt to choose a reasonable default.
3099 #
3100 # For example, the standard persistent disk type is a resource name
3101 # typically ending in "pd-standard". If SSD persistent disks are
3102 # available, the resource name typically ends with "pd-ssd". The
3103 # actual valid values are defined the Google Compute Engine API,
3104 # not by the Cloud Dataflow API; consult the Google Compute Engine
3105 # documentation for more information about determining the set of
3106 # available disk types for a particular project and zone.
3107 #
3108 # Google Compute Engine Disk types are local to a particular
3109 # project in a particular zone, and so the resource name will
3110 # typically look something like this:
3111 #
3112 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003113 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003114 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003115 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
3116 # only be set in the Fn API path. For non-cross-language pipelines this
3117 # should have only one entry. Cross-language pipelines will have two or more
3118 # entries.
3119 { # Defines a SDK harness container for executing Dataflow pipelines.
3120 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
3121 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
3122 # container instance with this image. If false (or unset) recommends using
3123 # more than one core per SDK container instance with this image for
3124 # efficiency. Note that Dataflow service may choose to override this property
3125 # if needed.
3126 },
3127 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07003128 },
3129 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003130 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
3131 # unspecified, the service will attempt to choose a reasonable
3132 # default. This should be in the form of the API service name,
3133 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003134 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3135 # storage. The system will append the suffix "/temp-{JOBNAME} to
3136 # this resource prefix, where {JOBNAME} is the value of the
3137 # job_name field. The resulting bucket and object prefix is used
3138 # as the prefix of the resources used to store temporary data
3139 # needed during the job execution. NOTE: This will override the
3140 # value in taskrunner_settings.
3141 # The supported resource type is:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003142 #
3143 # Google Cloud Storage:
3144 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003145 # storage.googleapis.com/{bucket}/{object}
3146 # bucket.storage.googleapis.com/{object}
3147 },
3148 "location": "A String", # The [regional endpoint]
3149 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3150 # contains this job.
3151 "tempFiles": [ # A set of files the system should be aware of that are used
3152 # for temporary storage. These temporary files will be
3153 # removed on job completion.
3154 # No duplicates are allowed.
3155 # No file patterns are supported.
3156 #
3157 # The supported files are:
3158 #
3159 # Google Cloud Storage:
3160 #
3161 # storage.googleapis.com/{bucket}/{object}
3162 # bucket.storage.googleapis.com/{object}
3163 "A String",
3164 ],
3165 "type": "A String", # The type of Cloud Dataflow job.
3166 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3167 # If this field is set, the service will ensure its uniqueness.
3168 # The request to create a job will fail if the service has knowledge of a
3169 # previously submitted job with the same client's ID and job name.
3170 # The caller may use this field to ensure idempotence of job
3171 # creation across retried attempts to create a job.
3172 # By default, the field is empty and, in that case, the service ignores it.
3173 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3174 # snapshot.
3175 "stepsLocation": "A String", # The GCS location where the steps are stored.
3176 "currentStateTime": "A String", # The timestamp associated with the current state.
3177 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3178 # Flexible resource scheduling jobs are started with some delay after job
3179 # creation, so start_time is unset before start and is updated when the
3180 # job is started by the Cloud Dataflow service. For other jobs, start_time
3181 # always equals to create_time and is immutable and set by the Cloud Dataflow
3182 # service.
3183 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3184 # Cloud Dataflow service.
3185 "requestedState": "A String", # The job's requested state.
3186 #
3187 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3188 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3189 # also be used to directly set a job's requested state to
3190 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3191 # job if it has not already reached a terminal state.
3192 "name": "A String", # The user-specified Cloud Dataflow job name.
3193 #
3194 # Only one Job with a given name may exist in a project at any
3195 # given time. If a caller attempts to create a Job with the same
3196 # name as an already-existing Job, the attempt returns the
3197 # existing Job.
3198 #
3199 # The name must match the regular expression
3200 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3201 "steps": [ # Exactly one of step or steps_location should be specified.
3202 #
3203 # The top-level steps that constitute the entire job.
3204 { # Defines a particular step within a Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003205 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003206 # A job consists of multiple steps, each of which performs some
3207 # specific operation as part of the overall job. Data is typically
3208 # passed from one step to another as part of the job.
3209 #
3210 # Here's an example of a sequence of steps which together implement a
3211 # Map-Reduce job:
3212 #
3213 # * Read a collection of data from some source, parsing the
3214 # collection's elements.
3215 #
3216 # * Validate the elements.
3217 #
3218 # * Apply a user-defined function to map each element to some value
3219 # and extract an element-specific key value.
3220 #
3221 # * Group elements with the same key into a single element with
3222 # that key, transforming a multiply-keyed collection into a
3223 # uniquely-keyed collection.
3224 #
3225 # * Write the elements out to some data sink.
3226 #
3227 # Note that the Cloud Dataflow service may be used to run many different
3228 # types of jobs, not just Map-Reduce.
3229 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003230 "name": "A String", # The name that identifies the step. This must be unique for each
3231 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003232 "properties": { # Named properties associated with the step. Each kind of
3233 # predefined step has its own required set of properties.
3234 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3235 "a_key": "", # Properties of the object.
3236 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003237 },
3238 ],
3239 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3240 # of the job it replaced.
3241 #
3242 # When sending a `CreateJobRequest`, you can update a job by specifying it
3243 # here. The job named here is stopped, and its intermediate state is
3244 # transferred to this job.
3245 "currentState": "A String", # The current state of the job.
3246 #
3247 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3248 # specified.
3249 #
3250 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3251 # terminal state. After a job has reached a terminal state, no
3252 # further state updates may be made.
3253 #
3254 # This field may be mutated by the Cloud Dataflow service;
3255 # callers cannot mutate it.
3256 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3257 # isn't contained in the submitted job.
3258 "stages": { # A mapping from each stage to the information about that stage.
3259 "a_key": { # Contains information about how a particular
3260 # google.dataflow.v1beta3.Step will be executed.
3261 "stepName": [ # The steps associated with the execution stage.
3262 # Note that stages may have several steps, and that a given step
3263 # might be run by more than one stage.
3264 "A String",
3265 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003266 },
3267 },
3268 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003269 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003270 ],
3271 }</pre>
3272</div>
3273
3274<div class="method">
3275 <code class="details" id="list_next">list_next(previous_request, previous_response)</code>
3276 <pre>Retrieves the next page of results.
3277
3278Args:
3279 previous_request: The request for the previous page. (required)
3280 previous_response: The response from the request for the previous page. (required)
3281
3282Returns:
3283 A request object that you can call 'execute()' on to request the next
3284 page. Returns None if there are no more items in the collection.
3285 </pre>
3286</div>
3287
3288<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07003289 <code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003290 <pre>Snapshot the state of a streaming job.
3291
3292Args:
3293 projectId: string, The project which owns the job to be snapshotted. (required)
3294 jobId: string, The job to be snapshotted. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07003295 body: object, The request body.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003296 The object takes the form of:
3297
3298{ # Request to create a snapshot of a job.
3299 "location": "A String", # The location that contains this job.
3300 "ttl": "A String", # TTL for the snapshot.
Dan O'Mearadd494642020-05-01 07:42:23 -07003301 "description": "A String", # User specified description of the snapshot. Maybe empty.
3302 "snapshotSources": True or False, # If true, perform snapshots for sources which support this.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003303 }
3304
3305 x__xgafv: string, V1 error format.
3306 Allowed values
3307 1 - v1 error format
3308 2 - v2 error format
3309
3310Returns:
3311 An object of the form:
3312
3313 { # Represents a snapshot of a job.
3314 "sourceJobId": "A String", # The job this snapshot was created from.
Dan O'Mearadd494642020-05-01 07:42:23 -07003315 "diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY
3316 # state.
3317 "description": "A String", # User specified description of the snapshot. Maybe empty.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003318 "projectId": "A String", # The project this snapshot belongs to.
3319 "creationTime": "A String", # The time this snapshot was created.
3320 "state": "A String", # State of the snapshot.
3321 "ttl": "A String", # The time after which this snapshot will be automatically deleted.
Dan O'Mearadd494642020-05-01 07:42:23 -07003322 "pubsubMetadata": [ # PubSub snapshot metadata.
3323 { # Represents a Pubsub snapshot.
3324 "expireTime": "A String", # The expire time of the Pubsub snapshot.
3325 "snapshotName": "A String", # The name of the Pubsub snapshot.
3326 "topicName": "A String", # The name of the Pubsub topic.
3327 },
3328 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003329 "id": "A String", # The unique ID of this snapshot.
3330 }</pre>
3331</div>
3332
3333<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07003334 <code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003335 <pre>Updates the state of an existing Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003336
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003337To update the state of an existing job, we recommend using
3338`projects.locations.jobs.update` with a [regional endpoint]
3339(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
3340`projects.jobs.update` is not recommended, as you can only update the state
3341of jobs that are running in `us-central1`.
3342
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003343Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003344 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
3345 jobId: string, The job ID. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07003346 body: object, The request body.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003347 The object takes the form of:
3348
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003349{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003350 "labels": { # User-defined labels for this job.
3351 #
3352 # The labels map can contain no more than 64 entries. Entries of the labels
3353 # map are UTF8 strings that comply with the following restrictions:
3354 #
3355 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3356 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07003357 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003358 # size.
3359 "a_key": "A String",
3360 },
3361 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3362 # by the metadata values provided here. Populated for ListJobs and all GetJob
3363 # views SUMMARY and higher.
3364 # ListJob response and Job SUMMARY view.
3365 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
3366 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
3367 "version": "A String", # The version of the SDK used to run the job.
3368 "sdkSupportStatus": "A String", # The support status for this SDK version.
3369 },
3370 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
3371 { # Metadata for a PubSub connector used by the job.
3372 "topic": "A String", # Topic accessed in the connection.
3373 "subscription": "A String", # Subscription used in the connection.
3374 },
3375 ],
3376 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
3377 { # Metadata for a Datastore connector used by the job.
3378 "projectId": "A String", # ProjectId accessed in the connection.
3379 "namespace": "A String", # Namespace used in the connection.
3380 },
3381 ],
3382 "fileDetails": [ # Identification of a File source used in the Dataflow job.
3383 { # Metadata for a File connector used by the job.
3384 "filePattern": "A String", # File Pattern used to access files by the connector.
3385 },
3386 ],
3387 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
3388 { # Metadata for a Spanner connector used by the job.
3389 "instanceId": "A String", # InstanceId accessed in the connection.
3390 "projectId": "A String", # ProjectId accessed in the connection.
3391 "databaseId": "A String", # DatabaseId accessed in the connection.
3392 },
3393 ],
3394 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
3395 { # Metadata for a BigTable connector used by the job.
3396 "instanceId": "A String", # InstanceId accessed in the connection.
3397 "projectId": "A String", # ProjectId accessed in the connection.
3398 "tableId": "A String", # TableId accessed in the connection.
3399 },
3400 ],
3401 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
3402 { # Metadata for a BigQuery connector used by the job.
3403 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003404 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07003405 "table": "A String", # Table accessed in the connection.
3406 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003407 },
3408 ],
3409 },
3410 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3411 # A description of the user pipeline and stages through which it is executed.
3412 # Created by Cloud Dataflow service. Only retrieved with
3413 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3414 # form. This data is provided by the Dataflow service for ease of visualizing
3415 # the pipeline and interpreting Dataflow provided metrics.
3416 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
3417 { # Description of the type, names/ids, and input/outputs for a transform.
3418 "kind": "A String", # Type of transform.
3419 "name": "A String", # User provided name for this transform instance.
3420 "inputCollectionName": [ # User names for all collection inputs to this transform.
3421 "A String",
3422 ],
3423 "displayData": [ # Transform-specific display data.
3424 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07003425 "key": "A String", # The key identifying the display data.
3426 # This is intended to be used as a label for the display data
3427 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003428 "shortStrValue": "A String", # A possible additional shorter value to display.
3429 # For example a java_class_name_value of com.mypackage.MyDoFn
3430 # will be stored with MyDoFn as the short_str_value and
3431 # com.mypackage.MyDoFn as the java_class_name value.
3432 # short_str_value can be displayed and java_class_name_value
3433 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07003434 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003435 "url": "A String", # An optional full URL.
3436 "floatValue": 3.14, # Contains value if the data is of float type.
3437 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3438 # language namespace (i.e. python module) which defines the display data.
3439 # This allows a dax monitoring system to specially handle the data
3440 # and perform custom rendering.
3441 "javaClassValue": "A String", # Contains value if the data is of java class type.
3442 "label": "A String", # An optional label to display in a dax UI for the element.
3443 "boolValue": True or False, # Contains value if the data is of a boolean type.
3444 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07003445 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003446 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003447 },
3448 ],
3449 "outputCollectionName": [ # User names for all collection outputs to this transform.
3450 "A String",
3451 ],
3452 "id": "A String", # SDK generated id of this transform instance.
3453 },
3454 ],
3455 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
3456 { # Description of the composing transforms, names/ids, and input/outputs of a
3457 # stage of execution. Some composing transforms and sources may have been
3458 # generated by the Dataflow service during execution planning.
3459 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
3460 { # Description of an interstitial value between transforms in an execution
3461 # stage.
3462 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3463 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3464 # source is most closely associated.
3465 "name": "A String", # Dataflow service generated name for this source.
3466 },
3467 ],
3468 "kind": "A String", # Type of tranform this stage is executing.
3469 "name": "A String", # Dataflow service generated name for this stage.
3470 "outputSource": [ # Output sources for this stage.
3471 { # Description of an input or output of an execution stage.
3472 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3473 "sizeBytes": "A String", # Size of the source, if measurable.
3474 "name": "A String", # Dataflow service generated name for this source.
3475 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3476 # source is most closely associated.
3477 },
3478 ],
3479 "inputSource": [ # Input sources for this stage.
3480 { # Description of an input or output of an execution stage.
3481 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3482 "sizeBytes": "A String", # Size of the source, if measurable.
3483 "name": "A String", # Dataflow service generated name for this source.
3484 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3485 # source is most closely associated.
3486 },
3487 ],
3488 "componentTransform": [ # Transforms that comprise this execution stage.
3489 { # Description of a transform executed as part of an execution stage.
3490 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3491 "originalTransform": "A String", # User name for the original user transform with which this transform is
3492 # most closely associated.
3493 "name": "A String", # Dataflow service generated name for this source.
3494 },
3495 ],
3496 "id": "A String", # Dataflow service generated id for this stage.
3497 },
3498 ],
3499 "displayData": [ # Pipeline level display data.
3500 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07003501 "key": "A String", # The key identifying the display data.
3502 # This is intended to be used as a label for the display data
3503 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003504 "shortStrValue": "A String", # A possible additional shorter value to display.
3505 # For example a java_class_name_value of com.mypackage.MyDoFn
3506 # will be stored with MyDoFn as the short_str_value and
3507 # com.mypackage.MyDoFn as the java_class_name value.
3508 # short_str_value can be displayed and java_class_name_value
3509 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07003510 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003511 "url": "A String", # An optional full URL.
3512 "floatValue": 3.14, # Contains value if the data is of float type.
3513 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3514 # language namespace (i.e. python module) which defines the display data.
3515 # This allows a dax monitoring system to specially handle the data
3516 # and perform custom rendering.
3517 "javaClassValue": "A String", # Contains value if the data is of java class type.
3518 "label": "A String", # An optional label to display in a dax UI for the element.
3519 "boolValue": True or False, # Contains value if the data is of a boolean type.
3520 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07003521 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003522 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003523 },
3524 ],
3525 },
3526 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
3527 # callers cannot mutate it.
3528 { # A message describing the state of a particular execution stage.
3529 "executionStageName": "A String", # The name of the execution stage.
3530 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
3531 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
3532 },
3533 ],
3534 "id": "A String", # The unique ID of this job.
3535 #
3536 # This field is set by the Cloud Dataflow service when the Job is
3537 # created, and is immutable for the life of the job.
3538 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
3539 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3540 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
3541 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
3542 # corresponding name prefixes of the new job.
3543 "a_key": "A String",
3544 },
3545 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003546 "workerRegion": "A String", # The Compute Engine region
3547 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3548 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
3549 # with worker_zone. If neither worker_region nor worker_zone is specified,
3550 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003551 "version": { # A structure describing which components and their versions of the service
3552 # are required in order to run the job.
3553 "a_key": "", # Properties of the object.
3554 },
3555 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
3556 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
3557 # at rest, AKA a Customer Managed Encryption Key (CMEK).
3558 #
3559 # Format:
3560 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
3561 "internalExperiments": { # Experimental settings.
3562 "a_key": "", # Properties of the object. Contains field @type with type URL.
3563 },
3564 "dataset": "A String", # The dataset for the current project where various workflow
3565 # related tables are stored.
3566 #
3567 # The supported resource type is:
3568 #
3569 # Google BigQuery:
3570 # bigquery.googleapis.com/{dataset}
3571 "experiments": [ # The list of experiments to enable.
3572 "A String",
3573 ],
3574 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
3575 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
3576 # options are passed through the service and are used to recreate the
3577 # SDK pipeline options on the worker in a language agnostic and platform
3578 # independent way.
3579 "a_key": "", # Properties of the object.
3580 },
3581 "userAgent": { # A description of the process that generated the request.
3582 "a_key": "", # Properties of the object.
3583 },
Dan O'Mearadd494642020-05-01 07:42:23 -07003584 "workerZone": "A String", # The Compute Engine zone
3585 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3586 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
3587 # with worker_region. If neither worker_region nor worker_zone is specified,
3588 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003589 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
3590 # specified in order for the job to have workers.
3591 { # Describes one particular pool of Cloud Dataflow workers to be
3592 # instantiated by the Cloud Dataflow service in order to perform the
3593 # computations required by a job. Note that a workflow job may use
3594 # multiple pools, in order to match the various computational
3595 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003596 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
3597 # harness, residing in Google Container Registry.
3598 #
3599 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
3600 "ipConfiguration": "A String", # Configuration for VM IPs.
3601 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3602 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
3603 "algorithm": "A String", # The algorithm to use for autoscaling.
3604 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003605 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07003606 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
3607 # the service will use the network "default".
3608 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
3609 # will attempt to choose a reasonable default.
3610 "metadata": { # Metadata to set on the Google Compute Engine VMs.
3611 "a_key": "A String",
3612 },
3613 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
3614 # service will attempt to choose a reasonable default.
3615 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
3616 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003617 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
3618 # using the standard Dataflow task runner. Users should ignore
3619 # this field.
3620 "workflowFileName": "A String", # The file to store the workflow in.
3621 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
3622 # will not be uploaded.
3623 #
3624 # The supported resource type is:
3625 #
3626 # Google Cloud Storage:
3627 # storage.googleapis.com/{bucket}/{object}
3628 # bucket.storage.googleapis.com/{object}
3629 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07003630 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
3631 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
3632 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
3633 "vmId": "A String", # The ID string of the VM.
3634 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
3635 # taskrunner; e.g. "wheel".
3636 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
3637 # taskrunner; e.g. "root".
3638 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
3639 # access the Cloud Dataflow API.
3640 "A String",
3641 ],
3642 "languageHint": "A String", # The suggested backend language.
3643 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3644 # console.
3645 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
3646 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003647 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
3648 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
3649 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
3650 # "shuffle/v1beta1".
3651 "workerId": "A String", # The ID of the worker running this pipeline.
3652 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
3653 #
3654 # When workers access Google Cloud APIs, they logically do so via
3655 # relative URLs. If this field is specified, it supplies the base
3656 # URL to use for resolving these relative URLs. The normative
3657 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3658 # Locators".
3659 #
3660 # If not specified, the default value is "http://www.googleapis.com/"
3661 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
3662 # "dataflow/v1b3/projects".
3663 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3664 # storage.
3665 #
3666 # The supported resource type is:
3667 #
3668 # Google Cloud Storage:
3669 #
3670 # storage.googleapis.com/{bucket}/{object}
3671 # bucket.storage.googleapis.com/{object}
3672 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003673 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
3674 "harnessCommand": "A String", # The command to launch the worker harness.
3675 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
3676 # temporary storage.
3677 #
3678 # The supported resource type is:
3679 #
3680 # Google Cloud Storage:
3681 # storage.googleapis.com/{bucket}/{object}
3682 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07003683 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3684 #
3685 # When workers access Google Cloud APIs, they logically do so via
3686 # relative URLs. If this field is specified, it supplies the base
3687 # URL to use for resolving these relative URLs. The normative
3688 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3689 # Locators".
3690 #
3691 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003692 },
Dan O'Mearadd494642020-05-01 07:42:23 -07003693 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
3694 # service will choose a number of threads (according to the number of cores
3695 # on the selected machine type for batch, or 1 by convention for streaming).
3696 "poolArgs": { # Extra arguments for this worker pool.
3697 "a_key": "", # Properties of the object. Contains field @type with type URL.
3698 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003699 "packages": [ # Packages to be installed on workers.
3700 { # The packages that must be installed in order for a worker to run the
3701 # steps of the Cloud Dataflow job that will be assigned to its worker
3702 # pool.
3703 #
3704 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3705 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3706 # might use this to install jars containing the user's code and all of the
3707 # various dependencies (libraries, data files, etc.) required in order
3708 # for that code to run.
3709 "location": "A String", # The resource to read the package from. The supported resource type is:
3710 #
3711 # Google Cloud Storage:
3712 #
3713 # storage.googleapis.com/{bucket}
3714 # bucket.storage.googleapis.com/
3715 "name": "A String", # The name of the package.
3716 },
3717 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003718 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3719 # select a default set of packages which are useful to worker
3720 # harnesses written in a particular language.
3721 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
3722 # are supported.
3723 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003724 # attempt to choose a reasonable default.
3725 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
3726 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3727 # `TEARDOWN_NEVER`.
3728 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3729 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3730 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3731 # down.
3732 #
3733 # If the workers are not torn down by the service, they will
3734 # continue to run and use Google Compute Engine VM resources in the
3735 # user's project until they are explicitly terminated by the user.
3736 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3737 # policy except for small, manually supervised test jobs.
3738 #
3739 # If unknown or unspecified, the service will attempt to choose a reasonable
3740 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07003741 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3742 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003743 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
3744 # execute the job. If zero or unspecified, the service will
3745 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003746 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3747 # the form "regions/REGION/subnetworks/SUBNETWORK".
3748 "dataDisks": [ # Data disks that are used by a VM in this workflow.
3749 { # Describes the data disk used by a workflow job.
3750 "mountPoint": "A String", # Directory in a VM where disk is mounted.
3751 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
3752 # attempt to choose a reasonable default.
3753 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
3754 # must be a disk type appropriate to the project and zone in which
3755 # the workers will run. If unknown or unspecified, the service
3756 # will attempt to choose a reasonable default.
3757 #
3758 # For example, the standard persistent disk type is a resource name
3759 # typically ending in "pd-standard". If SSD persistent disks are
3760 # available, the resource name typically ends with "pd-ssd". The
3761 # actual valid values are defined the Google Compute Engine API,
3762 # not by the Cloud Dataflow API; consult the Google Compute Engine
3763 # documentation for more information about determining the set of
3764 # available disk types for a particular project and zone.
3765 #
3766 # Google Compute Engine Disk types are local to a particular
3767 # project in a particular zone, and so the resource name will
3768 # typically look something like this:
3769 #
3770 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
3771 },
3772 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003773 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
3774 # only be set in the Fn API path. For non-cross-language pipelines this
3775 # should have only one entry. Cross-language pipelines will have two or more
3776 # entries.
3777 { # Defines a SDK harness container for executing Dataflow pipelines.
3778 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
3779 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
3780 # container instance with this image. If false (or unset) recommends using
3781 # more than one core per SDK container instance with this image for
3782 # efficiency. Note that Dataflow service may choose to override this property
3783 # if needed.
3784 },
3785 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003786 },
3787 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003788 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
3789 # unspecified, the service will attempt to choose a reasonable
3790 # default. This should be in the form of the API service name,
3791 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003792 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3793 # storage. The system will append the suffix "/temp-{JOBNAME} to
3794 # this resource prefix, where {JOBNAME} is the value of the
3795 # job_name field. The resulting bucket and object prefix is used
3796 # as the prefix of the resources used to store temporary data
3797 # needed during the job execution. NOTE: This will override the
3798 # value in taskrunner_settings.
3799 # The supported resource type is:
3800 #
3801 # Google Cloud Storage:
3802 #
3803 # storage.googleapis.com/{bucket}/{object}
3804 # bucket.storage.googleapis.com/{object}
3805 },
3806 "location": "A String", # The [regional endpoint]
3807 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3808 # contains this job.
3809 "tempFiles": [ # A set of files the system should be aware of that are used
3810 # for temporary storage. These temporary files will be
3811 # removed on job completion.
3812 # No duplicates are allowed.
3813 # No file patterns are supported.
3814 #
3815 # The supported files are:
3816 #
3817 # Google Cloud Storage:
3818 #
3819 # storage.googleapis.com/{bucket}/{object}
3820 # bucket.storage.googleapis.com/{object}
3821 "A String",
3822 ],
3823 "type": "A String", # The type of Cloud Dataflow job.
3824 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3825 # If this field is set, the service will ensure its uniqueness.
3826 # The request to create a job will fail if the service has knowledge of a
3827 # previously submitted job with the same client's ID and job name.
3828 # The caller may use this field to ensure idempotence of job
3829 # creation across retried attempts to create a job.
3830 # By default, the field is empty and, in that case, the service ignores it.
3831 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3832 # snapshot.
3833 "stepsLocation": "A String", # The GCS location where the steps are stored.
3834 "currentStateTime": "A String", # The timestamp associated with the current state.
3835 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3836 # Flexible resource scheduling jobs are started with some delay after job
3837 # creation, so start_time is unset before start and is updated when the
3838 # job is started by the Cloud Dataflow service. For other jobs, start_time
3839 # always equals to create_time and is immutable and set by the Cloud Dataflow
3840 # service.
3841 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3842 # Cloud Dataflow service.
3843 "requestedState": "A String", # The job's requested state.
3844 #
3845 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3846 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3847 # also be used to directly set a job's requested state to
3848 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3849 # job if it has not already reached a terminal state.
3850 "name": "A String", # The user-specified Cloud Dataflow job name.
3851 #
3852 # Only one Job with a given name may exist in a project at any
3853 # given time. If a caller attempts to create a Job with the same
3854 # name as an already-existing Job, the attempt returns the
3855 # existing Job.
3856 #
3857 # The name must match the regular expression
3858 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3859 "steps": [ # Exactly one of step or steps_location should be specified.
3860 #
3861 # The top-level steps that constitute the entire job.
3862 { # Defines a particular step within a Cloud Dataflow job.
3863 #
3864 # A job consists of multiple steps, each of which performs some
3865 # specific operation as part of the overall job. Data is typically
3866 # passed from one step to another as part of the job.
3867 #
3868 # Here's an example of a sequence of steps which together implement a
3869 # Map-Reduce job:
3870 #
3871 # * Read a collection of data from some source, parsing the
3872 # collection's elements.
3873 #
3874 # * Validate the elements.
3875 #
3876 # * Apply a user-defined function to map each element to some value
3877 # and extract an element-specific key value.
3878 #
3879 # * Group elements with the same key into a single element with
3880 # that key, transforming a multiply-keyed collection into a
3881 # uniquely-keyed collection.
3882 #
3883 # * Write the elements out to some data sink.
3884 #
3885 # Note that the Cloud Dataflow service may be used to run many different
3886 # types of jobs, not just Map-Reduce.
3887 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003888 "name": "A String", # The name that identifies the step. This must be unique for each
3889 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003890 "properties": { # Named properties associated with the step. Each kind of
3891 # predefined step has its own required set of properties.
3892 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3893 "a_key": "", # Properties of the object.
3894 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003895 },
3896 ],
3897 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3898 # of the job it replaced.
3899 #
3900 # When sending a `CreateJobRequest`, you can update a job by specifying it
3901 # here. The job named here is stopped, and its intermediate state is
3902 # transferred to this job.
3903 "currentState": "A String", # The current state of the job.
3904 #
3905 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3906 # specified.
3907 #
3908 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3909 # terminal state. After a job has reached a terminal state, no
3910 # further state updates may be made.
3911 #
3912 # This field may be mutated by the Cloud Dataflow service;
3913 # callers cannot mutate it.
3914 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3915 # isn't contained in the submitted job.
3916 "stages": { # A mapping from each stage to the information about that stage.
3917 "a_key": { # Contains information about how a particular
3918 # google.dataflow.v1beta3.Step will be executed.
3919 "stepName": [ # The steps associated with the execution stage.
3920 # Note that stages may have several steps, and that a given step
3921 # might be run by more than one stage.
3922 "A String",
3923 ],
3924 },
3925 },
3926 },
3927}
3928
3929 location: string, The [regional endpoint]
3930(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3931contains this job.
3932 x__xgafv: string, V1 error format.
3933 Allowed values
3934 1 - v1 error format
3935 2 - v2 error format
3936
3937Returns:
3938 An object of the form:
3939
3940 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003941 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003942 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003943 # The labels map can contain no more than 64 entries. Entries of the labels
3944 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003945 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003946 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3947 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07003948 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003949 # size.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07003950 "a_key": "A String",
3951 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003952 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3953 # by the metadata values provided here. Populated for ListJobs and all GetJob
3954 # views SUMMARY and higher.
3955 # ListJob response and Job SUMMARY view.
3956 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
3957 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
3958 "version": "A String", # The version of the SDK used to run the job.
3959 "sdkSupportStatus": "A String", # The support status for this SDK version.
3960 },
3961 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
3962 { # Metadata for a PubSub connector used by the job.
3963 "topic": "A String", # Topic accessed in the connection.
3964 "subscription": "A String", # Subscription used in the connection.
3965 },
3966 ],
3967 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
3968 { # Metadata for a Datastore connector used by the job.
3969 "projectId": "A String", # ProjectId accessed in the connection.
3970 "namespace": "A String", # Namespace used in the connection.
3971 },
3972 ],
3973 "fileDetails": [ # Identification of a File source used in the Dataflow job.
3974 { # Metadata for a File connector used by the job.
3975 "filePattern": "A String", # File Pattern used to access files by the connector.
3976 },
3977 ],
3978 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
3979 { # Metadata for a Spanner connector used by the job.
3980 "instanceId": "A String", # InstanceId accessed in the connection.
3981 "projectId": "A String", # ProjectId accessed in the connection.
3982 "databaseId": "A String", # DatabaseId accessed in the connection.
3983 },
3984 ],
3985 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
3986 { # Metadata for a BigTable connector used by the job.
3987 "instanceId": "A String", # InstanceId accessed in the connection.
3988 "projectId": "A String", # ProjectId accessed in the connection.
3989 "tableId": "A String", # TableId accessed in the connection.
3990 },
3991 ],
3992 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
3993 { # Metadata for a BigQuery connector used by the job.
3994 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003995 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07003996 "table": "A String", # Table accessed in the connection.
3997 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003998 },
3999 ],
4000 },
4001 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
4002 # A description of the user pipeline and stages through which it is executed.
4003 # Created by Cloud Dataflow service. Only retrieved with
4004 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
4005 # form. This data is provided by the Dataflow service for ease of visualizing
4006 # the pipeline and interpreting Dataflow provided metrics.
4007 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
4008 { # Description of the type, names/ids, and input/outputs for a transform.
4009 "kind": "A String", # Type of transform.
4010 "name": "A String", # User provided name for this transform instance.
4011 "inputCollectionName": [ # User names for all collection inputs to this transform.
4012 "A String",
4013 ],
4014 "displayData": [ # Transform-specific display data.
4015 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07004016 "key": "A String", # The key identifying the display data.
4017 # This is intended to be used as a label for the display data
4018 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004019 "shortStrValue": "A String", # A possible additional shorter value to display.
4020 # For example a java_class_name_value of com.mypackage.MyDoFn
4021 # will be stored with MyDoFn as the short_str_value and
4022 # com.mypackage.MyDoFn as the java_class_name value.
4023 # short_str_value can be displayed and java_class_name_value
4024 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07004025 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004026 "url": "A String", # An optional full URL.
4027 "floatValue": 3.14, # Contains value if the data is of float type.
4028 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
4029 # language namespace (i.e. python module) which defines the display data.
4030 # This allows a dax monitoring system to specially handle the data
4031 # and perform custom rendering.
4032 "javaClassValue": "A String", # Contains value if the data is of java class type.
4033 "label": "A String", # An optional label to display in a dax UI for the element.
4034 "boolValue": True or False, # Contains value if the data is of a boolean type.
4035 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07004036 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004037 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004038 },
4039 ],
4040 "outputCollectionName": [ # User names for all collection outputs to this transform.
4041 "A String",
4042 ],
4043 "id": "A String", # SDK generated id of this transform instance.
4044 },
4045 ],
4046 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
4047 { # Description of the composing transforms, names/ids, and input/outputs of a
4048 # stage of execution. Some composing transforms and sources may have been
4049 # generated by the Dataflow service during execution planning.
4050 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
4051 { # Description of an interstitial value between transforms in an execution
4052 # stage.
4053 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
4054 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
4055 # source is most closely associated.
4056 "name": "A String", # Dataflow service generated name for this source.
4057 },
4058 ],
4059 "kind": "A String", # Type of tranform this stage is executing.
4060 "name": "A String", # Dataflow service generated name for this stage.
4061 "outputSource": [ # Output sources for this stage.
4062 { # Description of an input or output of an execution stage.
4063 "userName": "A String", # Human-readable name for this source; may be user or system generated.
4064 "sizeBytes": "A String", # Size of the source, if measurable.
4065 "name": "A String", # Dataflow service generated name for this source.
4066 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
4067 # source is most closely associated.
4068 },
4069 ],
4070 "inputSource": [ # Input sources for this stage.
4071 { # Description of an input or output of an execution stage.
4072 "userName": "A String", # Human-readable name for this source; may be user or system generated.
4073 "sizeBytes": "A String", # Size of the source, if measurable.
4074 "name": "A String", # Dataflow service generated name for this source.
4075 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
4076 # source is most closely associated.
4077 },
4078 ],
4079 "componentTransform": [ # Transforms that comprise this execution stage.
4080 { # Description of a transform executed as part of an execution stage.
4081 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
4082 "originalTransform": "A String", # User name for the original user transform with which this transform is
4083 # most closely associated.
4084 "name": "A String", # Dataflow service generated name for this source.
4085 },
4086 ],
4087 "id": "A String", # Dataflow service generated id for this stage.
4088 },
4089 ],
4090 "displayData": [ # Pipeline level display data.
4091 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07004092 "key": "A String", # The key identifying the display data.
4093 # This is intended to be used as a label for the display data
4094 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004095 "shortStrValue": "A String", # A possible additional shorter value to display.
4096 # For example a java_class_name_value of com.mypackage.MyDoFn
4097 # will be stored with MyDoFn as the short_str_value and
4098 # com.mypackage.MyDoFn as the java_class_name value.
4099 # short_str_value can be displayed and java_class_name_value
4100 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07004101 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004102 "url": "A String", # An optional full URL.
4103 "floatValue": 3.14, # Contains value if the data is of float type.
4104 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
4105 # language namespace (i.e. python module) which defines the display data.
4106 # This allows a dax monitoring system to specially handle the data
4107 # and perform custom rendering.
4108 "javaClassValue": "A String", # Contains value if the data is of java class type.
4109 "label": "A String", # An optional label to display in a dax UI for the element.
4110 "boolValue": True or False, # Contains value if the data is of a boolean type.
4111 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07004112 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004113 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004114 },
4115 ],
4116 },
4117 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
4118 # callers cannot mutate it.
4119 { # A message describing the state of a particular execution stage.
4120 "executionStageName": "A String", # The name of the execution stage.
4121 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
4122 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
4123 },
4124 ],
4125 "id": "A String", # The unique ID of this job.
4126 #
4127 # This field is set by the Cloud Dataflow service when the Job is
4128 # created, and is immutable for the life of the job.
4129 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
4130 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
4131 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004132 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
4133 # corresponding name prefixes of the new job.
Takashi Matsuo06694102015-09-11 13:55:40 -07004134 "a_key": "A String",
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004135 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004136 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07004137 "workerRegion": "A String", # The Compute Engine region
4138 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
4139 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
4140 # with worker_zone. If neither worker_region nor worker_zone is specified,
4141 # default to the control plane's region.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004142 "version": { # A structure describing which components and their versions of the service
4143 # are required in order to run the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07004144 "a_key": "", # Properties of the object.
4145 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004146 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
4147 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
4148 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004149 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004150 # Format:
4151 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Takashi Matsuo06694102015-09-11 13:55:40 -07004152 "internalExperiments": { # Experimental settings.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07004153 "a_key": "", # Properties of the object. Contains field @type with type URL.
Takashi Matsuo06694102015-09-11 13:55:40 -07004154 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004155 "dataset": "A String", # The dataset for the current project where various workflow
4156 # related tables are stored.
4157 #
4158 # The supported resource type is:
4159 #
4160 # Google BigQuery:
4161 # bigquery.googleapis.com/{dataset}
Takashi Matsuo06694102015-09-11 13:55:40 -07004162 "experiments": [ # The list of experiments to enable.
4163 "A String",
4164 ],
Sai Cheemalapatiea3a5e12016-10-12 14:05:53 -07004165 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004166 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
4167 # options are passed through the service and are used to recreate the
4168 # SDK pipeline options on the worker in a language agnostic and platform
4169 # independent way.
Takashi Matsuo06694102015-09-11 13:55:40 -07004170 "a_key": "", # Properties of the object.
4171 },
4172 "userAgent": { # A description of the process that generated the request.
4173 "a_key": "", # Properties of the object.
4174 },
Dan O'Mearadd494642020-05-01 07:42:23 -07004175 "workerZone": "A String", # The Compute Engine zone
4176 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
4177 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
4178 # with worker_region. If neither worker_region nor worker_zone is specified,
4179 # a zone in the control plane's region is chosen based on available capacity.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004180 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
4181 # specified in order for the job to have workers.
4182 { # Describes one particular pool of Cloud Dataflow workers to be
4183 # instantiated by the Cloud Dataflow service in order to perform the
4184 # computations required by a job. Note that a workflow job may use
4185 # multiple pools, in order to match the various computational
4186 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07004187 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
4188 # harness, residing in Google Container Registry.
4189 #
4190 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
4191 "ipConfiguration": "A String", # Configuration for VM IPs.
4192 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
4193 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
4194 "algorithm": "A String", # The algorithm to use for autoscaling.
4195 },
Takashi Matsuo06694102015-09-11 13:55:40 -07004196 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07004197 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
4198 # the service will use the network "default".
4199 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
4200 # will attempt to choose a reasonable default.
4201 "metadata": { # Metadata to set on the Google Compute Engine VMs.
4202 "a_key": "A String",
4203 },
4204 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
4205 # service will attempt to choose a reasonable default.
4206 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
4207 # Compute Engine API.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004208 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
4209 # using the standard Dataflow task runner. Users should ignore
4210 # this field.
4211 "workflowFileName": "A String", # The file to store the workflow in.
4212 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
4213 # will not be uploaded.
4214 #
4215 # The supported resource type is:
4216 #
4217 # Google Cloud Storage:
4218 # storage.googleapis.com/{bucket}/{object}
4219 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07004220 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07004221 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
4222 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
4223 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
4224 "vmId": "A String", # The ID string of the VM.
4225 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
4226 # taskrunner; e.g. "wheel".
4227 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
4228 # taskrunner; e.g. "root".
4229 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
4230 # access the Cloud Dataflow API.
4231 "A String",
4232 ],
4233 "languageHint": "A String", # The suggested backend language.
4234 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
4235 # console.
4236 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
4237 "logDir": "A String", # The directory on the VM to store logs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004238 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
4239 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
4240 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
4241 # "shuffle/v1beta1".
4242 "workerId": "A String", # The ID of the worker running this pipeline.
4243 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
4244 #
4245 # When workers access Google Cloud APIs, they logically do so via
4246 # relative URLs. If this field is specified, it supplies the base
4247 # URL to use for resolving these relative URLs. The normative
4248 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
4249 # Locators".
4250 #
4251 # If not specified, the default value is "http://www.googleapis.com/"
4252 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
4253 # "dataflow/v1b3/projects".
4254 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
4255 # storage.
4256 #
4257 # The supported resource type is:
4258 #
4259 # Google Cloud Storage:
4260 #
4261 # storage.googleapis.com/{bucket}/{object}
4262 # bucket.storage.googleapis.com/{object}
4263 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004264 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07004265 "harnessCommand": "A String", # The command to launch the worker harness.
4266 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
4267 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004268 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07004269 # The supported resource type is:
4270 #
4271 # Google Cloud Storage:
4272 # storage.googleapis.com/{bucket}/{object}
4273 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07004274 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
4275 #
4276 # When workers access Google Cloud APIs, they logically do so via
4277 # relative URLs. If this field is specified, it supplies the base
4278 # URL to use for resolving these relative URLs. The normative
4279 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
4280 # Locators".
4281 #
4282 # If not specified, the default value is "http://www.googleapis.com/"
Takashi Matsuo06694102015-09-11 13:55:40 -07004283 },
Dan O'Mearadd494642020-05-01 07:42:23 -07004284 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
4285 # service will choose a number of threads (according to the number of cores
4286 # on the selected machine type for batch, or 1 by convention for streaming).
4287 "poolArgs": { # Extra arguments for this worker pool.
4288 "a_key": "", # Properties of the object. Contains field @type with type URL.
4289 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004290 "packages": [ # Packages to be installed on workers.
4291 { # The packages that must be installed in order for a worker to run the
4292 # steps of the Cloud Dataflow job that will be assigned to its worker
4293 # pool.
4294 #
4295 # This is the mechanism by which the Cloud Dataflow SDK causes code to
4296 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
4297 # might use this to install jars containing the user's code and all of the
4298 # various dependencies (libraries, data files, etc.) required in order
4299 # for that code to run.
4300 "location": "A String", # The resource to read the package from. The supported resource type is:
4301 #
4302 # Google Cloud Storage:
4303 #
4304 # storage.googleapis.com/{bucket}
4305 # bucket.storage.googleapis.com/
4306 "name": "A String", # The name of the package.
4307 },
4308 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07004309 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
4310 # select a default set of packages which are useful to worker
4311 # harnesses written in a particular language.
4312 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
4313 # are supported.
4314 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004315 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004316 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
4317 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
4318 # `TEARDOWN_NEVER`.
4319 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
4320 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
4321 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
4322 # down.
4323 #
4324 # If the workers are not torn down by the service, they will
4325 # continue to run and use Google Compute Engine VM resources in the
4326 # user's project until they are explicitly terminated by the user.
4327 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
4328 # policy except for small, manually supervised test jobs.
4329 #
4330 # If unknown or unspecified, the service will attempt to choose a reasonable
4331 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07004332 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
4333 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004334 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
4335 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004336 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004337 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
4338 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004339 "dataDisks": [ # Data disks that are used by a VM in this workflow.
4340 { # Describes the data disk used by a workflow job.
4341 "mountPoint": "A String", # Directory in a VM where disk is mounted.
4342 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
4343 # attempt to choose a reasonable default.
4344 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
4345 # must be a disk type appropriate to the project and zone in which
4346 # the workers will run. If unknown or unspecified, the service
4347 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004348 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004349 # For example, the standard persistent disk type is a resource name
4350 # typically ending in "pd-standard". If SSD persistent disks are
4351 # available, the resource name typically ends with "pd-ssd". The
4352 # actual valid values are defined the Google Compute Engine API,
4353 # not by the Cloud Dataflow API; consult the Google Compute Engine
4354 # documentation for more information about determining the set of
4355 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004356 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004357 # Google Compute Engine Disk types are local to a particular
4358 # project in a particular zone, and so the resource name will
4359 # typically look something like this:
4360 #
4361 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004362 },
4363 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07004364 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
4365 # only be set in the Fn API path. For non-cross-language pipelines this
4366 # should have only one entry. Cross-language pipelines will have two or more
4367 # entries.
4368 { # Defines a SDK harness container for executing Dataflow pipelines.
4369 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
4370 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
4371 # container instance with this image. If false (or unset) recommends using
4372 # more than one core per SDK container instance with this image for
4373 # efficiency. Note that Dataflow service may choose to override this property
4374 # if needed.
4375 },
4376 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07004377 },
4378 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07004379 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
4380 # unspecified, the service will attempt to choose a reasonable
4381 # default. This should be in the form of the API service name,
4382 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004383 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
4384 # storage. The system will append the suffix "/temp-{JOBNAME} to
4385 # this resource prefix, where {JOBNAME} is the value of the
4386 # job_name field. The resulting bucket and object prefix is used
4387 # as the prefix of the resources used to store temporary data
4388 # needed during the job execution. NOTE: This will override the
4389 # value in taskrunner_settings.
4390 # The supported resource type is:
4391 #
4392 # Google Cloud Storage:
4393 #
4394 # storage.googleapis.com/{bucket}/{object}
4395 # bucket.storage.googleapis.com/{object}
Takashi Matsuo06694102015-09-11 13:55:40 -07004396 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004397 "location": "A String", # The [regional endpoint]
4398 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
4399 # contains this job.
4400 "tempFiles": [ # A set of files the system should be aware of that are used
4401 # for temporary storage. These temporary files will be
4402 # removed on job completion.
4403 # No duplicates are allowed.
4404 # No file patterns are supported.
4405 #
4406 # The supported files are:
4407 #
4408 # Google Cloud Storage:
4409 #
4410 # storage.googleapis.com/{bucket}/{object}
4411 # bucket.storage.googleapis.com/{object}
4412 "A String",
4413 ],
4414 "type": "A String", # The type of Cloud Dataflow job.
4415 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
4416 # If this field is set, the service will ensure its uniqueness.
4417 # The request to create a job will fail if the service has knowledge of a
4418 # previously submitted job with the same client's ID and job name.
4419 # The caller may use this field to ensure idempotence of job
4420 # creation across retried attempts to create a job.
4421 # By default, the field is empty and, in that case, the service ignores it.
4422 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
4423 # snapshot.
4424 "stepsLocation": "A String", # The GCS location where the steps are stored.
4425 "currentStateTime": "A String", # The timestamp associated with the current state.
4426 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
4427 # Flexible resource scheduling jobs are started with some delay after job
4428 # creation, so start_time is unset before start and is updated when the
4429 # job is started by the Cloud Dataflow service. For other jobs, start_time
4430 # always equals to create_time and is immutable and set by the Cloud Dataflow
4431 # service.
4432 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
4433 # Cloud Dataflow service.
4434 "requestedState": "A String", # The job's requested state.
4435 #
4436 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
4437 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
4438 # also be used to directly set a job's requested state to
4439 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
4440 # job if it has not already reached a terminal state.
4441 "name": "A String", # The user-specified Cloud Dataflow job name.
4442 #
4443 # Only one Job with a given name may exist in a project at any
4444 # given time. If a caller attempts to create a Job with the same
4445 # name as an already-existing Job, the attempt returns the
4446 # existing Job.
4447 #
4448 # The name must match the regular expression
4449 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
4450 "steps": [ # Exactly one of step or steps_location should be specified.
4451 #
4452 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004453 { # Defines a particular step within a Cloud Dataflow job.
4454 #
4455 # A job consists of multiple steps, each of which performs some
4456 # specific operation as part of the overall job. Data is typically
4457 # passed from one step to another as part of the job.
4458 #
4459 # Here's an example of a sequence of steps which together implement a
4460 # Map-Reduce job:
4461 #
4462 # * Read a collection of data from some source, parsing the
4463 # collection's elements.
4464 #
4465 # * Validate the elements.
4466 #
4467 # * Apply a user-defined function to map each element to some value
4468 # and extract an element-specific key value.
4469 #
4470 # * Group elements with the same key into a single element with
4471 # that key, transforming a multiply-keyed collection into a
4472 # uniquely-keyed collection.
4473 #
4474 # * Write the elements out to some data sink.
4475 #
4476 # Note that the Cloud Dataflow service may be used to run many different
4477 # types of jobs, not just Map-Reduce.
4478 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07004479 "name": "A String", # The name that identifies the step. This must be unique for each
4480 # step with respect to all other steps in the Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004481 "properties": { # Named properties associated with the step. Each kind of
4482 # predefined step has its own required set of properties.
4483 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Takashi Matsuo06694102015-09-11 13:55:40 -07004484 "a_key": "", # Properties of the object.
4485 },
4486 },
4487 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07004488 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
4489 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004490 #
Thomas Coffee2f245372017-03-27 10:39:26 -07004491 # When sending a `CreateJobRequest`, you can update a job by specifying it
4492 # here. The job named here is stopped, and its intermediate state is
4493 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004494 "currentState": "A String", # The current state of the job.
4495 #
4496 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
4497 # specified.
4498 #
4499 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
4500 # terminal state. After a job has reached a terminal state, no
4501 # further state updates may be made.
4502 #
4503 # This field may be mutated by the Cloud Dataflow service;
4504 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004505 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
4506 # isn't contained in the submitted job.
Takashi Matsuo06694102015-09-11 13:55:40 -07004507 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004508 "a_key": { # Contains information about how a particular
4509 # google.dataflow.v1beta3.Step will be executed.
4510 "stepName": [ # The steps associated with the execution stage.
4511 # Note that stages may have several steps, and that a given step
4512 # might be run by more than one stage.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004513 "A String",
4514 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004515 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004516 },
4517 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004518 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004519</div>
4520
4521</body></html>