blob: 20a94c97ef656ebacbcee45a9ade6840400cb671 [file] [log] [blame]
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.jobs.html">jobs</a></h1>
Nathaniel Manista4f877e52015-06-15 16:44:50 +000076<h2>Instance Methods</h2>
77<p class="toc_element">
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -070078 <code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>
79</p>
80<p class="firstline">Returns the debug Resource.</p>
81
82<p class="toc_element">
Nathaniel Manista4f877e52015-06-15 16:44:50 +000083 <code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>
84</p>
85<p class="firstline">Returns the messages Resource.</p>
86
87<p class="toc_element">
88 <code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>
89</p>
90<p class="firstline">Returns the workItems Resource.</p>
91
92<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070093 <code><a href="#aggregated">aggregated(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</a></code></p>
94<p class="firstline">List the jobs of a project across all regions.</p>
95<p class="toc_element">
96 <code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>
97<p class="firstline">Retrieves the next page of results.</p>
98<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080099 <code><a href="#create">create(projectId, body, location=None, x__xgafv=None, replaceJobId=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400100<p class="firstline">Creates a Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000101<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800102 <code><a href="#get">get(projectId, jobId, location=None, x__xgafv=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400103<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000104<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800105 <code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000106<p class="firstline">Request the job status.</p>
107<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700108 <code><a href="#list">list(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400109<p class="firstline">List the jobs of a project.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000110<p class="toc_element">
111 <code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
112<p class="firstline">Retrieves the next page of results.</p>
113<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700114 <code><a href="#snapshot">snapshot(projectId, jobId, body, x__xgafv=None)</a></code></p>
115<p class="firstline">Snapshot the state of a streaming job.</p>
116<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800117 <code><a href="#update">update(projectId, jobId, body, location=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400118<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000119<h3>Method Details</h3>
120<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700121 <code class="details" id="aggregated">aggregated(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</code>
122 <pre>List the jobs of a project across all regions.
123
124Args:
125 projectId: string, The project which owns the jobs. (required)
126 pageSize: integer, If there are many jobs, limit response to at most this many.
127The actual number of jobs returned will be the lesser of max_responses
128and an unspecified server-defined limit.
129 pageToken: string, Set this to the 'next_page_token' field of a previous response
130to request additional results in a long list.
131 x__xgafv: string, V1 error format.
132 Allowed values
133 1 - v1 error format
134 2 - v2 error format
135 location: string, The [regional endpoint]
136(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
137contains this job.
138 filter: string, The kind of filter to use.
139 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
140
141Returns:
142 An object of the form:
143
144 { # Response to a request to list Cloud Dataflow jobs. This may be a partial
145 # response, depending on the page size in the ListJobsRequest.
146 "nextPageToken": "A String", # Set if there may be more results than fit in this response.
147 "failedLocation": [ # Zero or more messages describing the [regional endpoints]
148 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
149 # failed to respond.
150 { # Indicates which [regional endpoint]
151 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
152 # to respond to a request for data.
153 "name": "A String", # The name of the [regional endpoint]
154 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
155 # failed to respond.
156 },
157 ],
158 "jobs": [ # A subset of the requested job information.
159 { # Defines a job to be run by the Cloud Dataflow service.
160 "labels": { # User-defined labels for this job.
161 #
162 # The labels map can contain no more than 64 entries. Entries of the labels
163 # map are UTF8 strings that comply with the following restrictions:
164 #
165 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
166 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
167 # * Both keys and values are additionally constrained to be <= 128 bytes in
168 # size.
169 "a_key": "A String",
170 },
171 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
172 # by the metadata values provided here. Populated for ListJobs and all GetJob
173 # views SUMMARY and higher.
174 # ListJob response and Job SUMMARY view.
175 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
176 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
177 "version": "A String", # The version of the SDK used to run the job.
178 "sdkSupportStatus": "A String", # The support status for this SDK version.
179 },
180 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
181 { # Metadata for a PubSub connector used by the job.
182 "topic": "A String", # Topic accessed in the connection.
183 "subscription": "A String", # Subscription used in the connection.
184 },
185 ],
186 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
187 { # Metadata for a Datastore connector used by the job.
188 "projectId": "A String", # ProjectId accessed in the connection.
189 "namespace": "A String", # Namespace used in the connection.
190 },
191 ],
192 "fileDetails": [ # Identification of a File source used in the Dataflow job.
193 { # Metadata for a File connector used by the job.
194 "filePattern": "A String", # File Pattern used to access files by the connector.
195 },
196 ],
197 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
198 { # Metadata for a Spanner connector used by the job.
199 "instanceId": "A String", # InstanceId accessed in the connection.
200 "projectId": "A String", # ProjectId accessed in the connection.
201 "databaseId": "A String", # DatabaseId accessed in the connection.
202 },
203 ],
204 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
205 { # Metadata for a BigTable connector used by the job.
206 "instanceId": "A String", # InstanceId accessed in the connection.
207 "projectId": "A String", # ProjectId accessed in the connection.
208 "tableId": "A String", # TableId accessed in the connection.
209 },
210 ],
211 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
212 { # Metadata for a BigQuery connector used by the job.
213 "projectId": "A String", # Project accessed in the connection.
214 "dataset": "A String", # Dataset accessed in the connection.
215 "table": "A String", # Table accessed in the connection.
216 "query": "A String", # Query used to access data in the connection.
217 },
218 ],
219 },
220 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
221 # A description of the user pipeline and stages through which it is executed.
222 # Created by Cloud Dataflow service. Only retrieved with
223 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
224 # form. This data is provided by the Dataflow service for ease of visualizing
225 # the pipeline and interpreting Dataflow provided metrics.
226 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
227 { # Description of the type, names/ids, and input/outputs for a transform.
228 "kind": "A String", # Type of transform.
229 "name": "A String", # User provided name for this transform instance.
230 "inputCollectionName": [ # User names for all collection inputs to this transform.
231 "A String",
232 ],
233 "displayData": [ # Transform-specific display data.
234 { # Data provided with a pipeline or transform to provide descriptive info.
235 "shortStrValue": "A String", # A possible additional shorter value to display.
236 # For example a java_class_name_value of com.mypackage.MyDoFn
237 # will be stored with MyDoFn as the short_str_value and
238 # com.mypackage.MyDoFn as the java_class_name value.
239 # short_str_value can be displayed and java_class_name_value
240 # will be displayed as a tooltip.
241 "durationValue": "A String", # Contains value if the data is of duration type.
242 "url": "A String", # An optional full URL.
243 "floatValue": 3.14, # Contains value if the data is of float type.
244 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
245 # language namespace (i.e. python module) which defines the display data.
246 # This allows a dax monitoring system to specially handle the data
247 # and perform custom rendering.
248 "javaClassValue": "A String", # Contains value if the data is of java class type.
249 "label": "A String", # An optional label to display in a dax UI for the element.
250 "boolValue": True or False, # Contains value if the data is of a boolean type.
251 "strValue": "A String", # Contains value if the data is of string type.
252 "key": "A String", # The key identifying the display data.
253 # This is intended to be used as a label for the display data
254 # when viewed in a dax monitoring system.
255 "int64Value": "A String", # Contains value if the data is of int64 type.
256 "timestampValue": "A String", # Contains value if the data is of timestamp type.
257 },
258 ],
259 "outputCollectionName": [ # User names for all collection outputs to this transform.
260 "A String",
261 ],
262 "id": "A String", # SDK generated id of this transform instance.
263 },
264 ],
265 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
266 { # Description of the composing transforms, names/ids, and input/outputs of a
267 # stage of execution. Some composing transforms and sources may have been
268 # generated by the Dataflow service during execution planning.
269 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
270 { # Description of an interstitial value between transforms in an execution
271 # stage.
272 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
273 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
274 # source is most closely associated.
275 "name": "A String", # Dataflow service generated name for this source.
276 },
277 ],
278 "kind": "A String", # Type of tranform this stage is executing.
279 "name": "A String", # Dataflow service generated name for this stage.
280 "outputSource": [ # Output sources for this stage.
281 { # Description of an input or output of an execution stage.
282 "userName": "A String", # Human-readable name for this source; may be user or system generated.
283 "sizeBytes": "A String", # Size of the source, if measurable.
284 "name": "A String", # Dataflow service generated name for this source.
285 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
286 # source is most closely associated.
287 },
288 ],
289 "inputSource": [ # Input sources for this stage.
290 { # Description of an input or output of an execution stage.
291 "userName": "A String", # Human-readable name for this source; may be user or system generated.
292 "sizeBytes": "A String", # Size of the source, if measurable.
293 "name": "A String", # Dataflow service generated name for this source.
294 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
295 # source is most closely associated.
296 },
297 ],
298 "componentTransform": [ # Transforms that comprise this execution stage.
299 { # Description of a transform executed as part of an execution stage.
300 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
301 "originalTransform": "A String", # User name for the original user transform with which this transform is
302 # most closely associated.
303 "name": "A String", # Dataflow service generated name for this source.
304 },
305 ],
306 "id": "A String", # Dataflow service generated id for this stage.
307 },
308 ],
309 "displayData": [ # Pipeline level display data.
310 { # Data provided with a pipeline or transform to provide descriptive info.
311 "shortStrValue": "A String", # A possible additional shorter value to display.
312 # For example a java_class_name_value of com.mypackage.MyDoFn
313 # will be stored with MyDoFn as the short_str_value and
314 # com.mypackage.MyDoFn as the java_class_name value.
315 # short_str_value can be displayed and java_class_name_value
316 # will be displayed as a tooltip.
317 "durationValue": "A String", # Contains value if the data is of duration type.
318 "url": "A String", # An optional full URL.
319 "floatValue": 3.14, # Contains value if the data is of float type.
320 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
321 # language namespace (i.e. python module) which defines the display data.
322 # This allows a dax monitoring system to specially handle the data
323 # and perform custom rendering.
324 "javaClassValue": "A String", # Contains value if the data is of java class type.
325 "label": "A String", # An optional label to display in a dax UI for the element.
326 "boolValue": True or False, # Contains value if the data is of a boolean type.
327 "strValue": "A String", # Contains value if the data is of string type.
328 "key": "A String", # The key identifying the display data.
329 # This is intended to be used as a label for the display data
330 # when viewed in a dax monitoring system.
331 "int64Value": "A String", # Contains value if the data is of int64 type.
332 "timestampValue": "A String", # Contains value if the data is of timestamp type.
333 },
334 ],
335 },
336 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
337 # callers cannot mutate it.
338 { # A message describing the state of a particular execution stage.
339 "executionStageName": "A String", # The name of the execution stage.
340 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
341 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
342 },
343 ],
344 "id": "A String", # The unique ID of this job.
345 #
346 # This field is set by the Cloud Dataflow service when the Job is
347 # created, and is immutable for the life of the job.
348 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
349 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
350 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
351 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
352 # corresponding name prefixes of the new job.
353 "a_key": "A String",
354 },
355 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
356 "version": { # A structure describing which components and their versions of the service
357 # are required in order to run the job.
358 "a_key": "", # Properties of the object.
359 },
360 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
361 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
362 # at rest, AKA a Customer Managed Encryption Key (CMEK).
363 #
364 # Format:
365 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
366 "internalExperiments": { # Experimental settings.
367 "a_key": "", # Properties of the object. Contains field @type with type URL.
368 },
369 "dataset": "A String", # The dataset for the current project where various workflow
370 # related tables are stored.
371 #
372 # The supported resource type is:
373 #
374 # Google BigQuery:
375 # bigquery.googleapis.com/{dataset}
376 "experiments": [ # The list of experiments to enable.
377 "A String",
378 ],
379 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
380 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
381 # options are passed through the service and are used to recreate the
382 # SDK pipeline options on the worker in a language agnostic and platform
383 # independent way.
384 "a_key": "", # Properties of the object.
385 },
386 "userAgent": { # A description of the process that generated the request.
387 "a_key": "", # Properties of the object.
388 },
389 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
390 # unspecified, the service will attempt to choose a reasonable
391 # default. This should be in the form of the API service name,
392 # e.g. "compute.googleapis.com".
393 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
394 # specified in order for the job to have workers.
395 { # Describes one particular pool of Cloud Dataflow workers to be
396 # instantiated by the Cloud Dataflow service in order to perform the
397 # computations required by a job. Note that a workflow job may use
398 # multiple pools, in order to match the various computational
399 # requirements of the various stages of the job.
400 "diskSourceImage": "A String", # Fully qualified source image for disks.
401 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
402 # using the standard Dataflow task runner. Users should ignore
403 # this field.
404 "workflowFileName": "A String", # The file to store the workflow in.
405 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
406 # will not be uploaded.
407 #
408 # The supported resource type is:
409 #
410 # Google Cloud Storage:
411 # storage.googleapis.com/{bucket}/{object}
412 # bucket.storage.googleapis.com/{object}
413 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
414 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
415 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
416 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
417 # "shuffle/v1beta1".
418 "workerId": "A String", # The ID of the worker running this pipeline.
419 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
420 #
421 # When workers access Google Cloud APIs, they logically do so via
422 # relative URLs. If this field is specified, it supplies the base
423 # URL to use for resolving these relative URLs. The normative
424 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
425 # Locators".
426 #
427 # If not specified, the default value is "http://www.googleapis.com/"
428 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
429 # "dataflow/v1b3/projects".
430 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
431 # storage.
432 #
433 # The supported resource type is:
434 #
435 # Google Cloud Storage:
436 #
437 # storage.googleapis.com/{bucket}/{object}
438 # bucket.storage.googleapis.com/{object}
439 },
440 "vmId": "A String", # The ID string of the VM.
441 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
442 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
443 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
444 # access the Cloud Dataflow API.
445 "A String",
446 ],
447 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
448 # taskrunner; e.g. "root".
449 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
450 #
451 # When workers access Google Cloud APIs, they logically do so via
452 # relative URLs. If this field is specified, it supplies the base
453 # URL to use for resolving these relative URLs. The normative
454 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
455 # Locators".
456 #
457 # If not specified, the default value is "http://www.googleapis.com/"
458 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
459 # taskrunner; e.g. "wheel".
460 "languageHint": "A String", # The suggested backend language.
461 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
462 # console.
463 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
464 "logDir": "A String", # The directory on the VM to store logs.
465 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
466 "harnessCommand": "A String", # The command to launch the worker harness.
467 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
468 # temporary storage.
469 #
470 # The supported resource type is:
471 #
472 # Google Cloud Storage:
473 # storage.googleapis.com/{bucket}/{object}
474 # bucket.storage.googleapis.com/{object}
475 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
476 },
477 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
478 # are supported.
479 "packages": [ # Packages to be installed on workers.
480 { # The packages that must be installed in order for a worker to run the
481 # steps of the Cloud Dataflow job that will be assigned to its worker
482 # pool.
483 #
484 # This is the mechanism by which the Cloud Dataflow SDK causes code to
485 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
486 # might use this to install jars containing the user's code and all of the
487 # various dependencies (libraries, data files, etc.) required in order
488 # for that code to run.
489 "location": "A String", # The resource to read the package from. The supported resource type is:
490 #
491 # Google Cloud Storage:
492 #
493 # storage.googleapis.com/{bucket}
494 # bucket.storage.googleapis.com/
495 "name": "A String", # The name of the package.
496 },
497 ],
498 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
499 # service will attempt to choose a reasonable default.
500 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
501 # the service will use the network "default".
502 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
503 # will attempt to choose a reasonable default.
504 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
505 # attempt to choose a reasonable default.
506 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
507 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
508 # `TEARDOWN_NEVER`.
509 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
510 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
511 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
512 # down.
513 #
514 # If the workers are not torn down by the service, they will
515 # continue to run and use Google Compute Engine VM resources in the
516 # user's project until they are explicitly terminated by the user.
517 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
518 # policy except for small, manually supervised test jobs.
519 #
520 # If unknown or unspecified, the service will attempt to choose a reasonable
521 # default.
522 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
523 # Compute Engine API.
524 "ipConfiguration": "A String", # Configuration for VM IPs.
525 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
526 # service will choose a number of threads (according to the number of cores
527 # on the selected machine type for batch, or 1 by convention for streaming).
528 "poolArgs": { # Extra arguments for this worker pool.
529 "a_key": "", # Properties of the object. Contains field @type with type URL.
530 },
531 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
532 # execute the job. If zero or unspecified, the service will
533 # attempt to choose a reasonable default.
534 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
535 # harness, residing in Google Container Registry.
536 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
537 # the form "regions/REGION/subnetworks/SUBNETWORK".
538 "dataDisks": [ # Data disks that are used by a VM in this workflow.
539 { # Describes the data disk used by a workflow job.
540 "mountPoint": "A String", # Directory in a VM where disk is mounted.
541 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
542 # attempt to choose a reasonable default.
543 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
544 # must be a disk type appropriate to the project and zone in which
545 # the workers will run. If unknown or unspecified, the service
546 # will attempt to choose a reasonable default.
547 #
548 # For example, the standard persistent disk type is a resource name
549 # typically ending in "pd-standard". If SSD persistent disks are
550 # available, the resource name typically ends with "pd-ssd". The
551 # actual valid values are defined the Google Compute Engine API,
552 # not by the Cloud Dataflow API; consult the Google Compute Engine
553 # documentation for more information about determining the set of
554 # available disk types for a particular project and zone.
555 #
556 # Google Compute Engine Disk types are local to a particular
557 # project in a particular zone, and so the resource name will
558 # typically look something like this:
559 #
560 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
561 },
562 ],
563 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
564 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
565 "algorithm": "A String", # The algorithm to use for autoscaling.
566 },
567 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
568 # select a default set of packages which are useful to worker
569 # harnesses written in a particular language.
570 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
571 # attempt to choose a reasonable default.
572 "metadata": { # Metadata to set on the Google Compute Engine VMs.
573 "a_key": "A String",
574 },
575 },
576 ],
577 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
578 # storage. The system will append the suffix "/temp-{JOBNAME} to
579 # this resource prefix, where {JOBNAME} is the value of the
580 # job_name field. The resulting bucket and object prefix is used
581 # as the prefix of the resources used to store temporary data
582 # needed during the job execution. NOTE: This will override the
583 # value in taskrunner_settings.
584 # The supported resource type is:
585 #
586 # Google Cloud Storage:
587 #
588 # storage.googleapis.com/{bucket}/{object}
589 # bucket.storage.googleapis.com/{object}
590 },
591 "location": "A String", # The [regional endpoint]
592 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
593 # contains this job.
594 "tempFiles": [ # A set of files the system should be aware of that are used
595 # for temporary storage. These temporary files will be
596 # removed on job completion.
597 # No duplicates are allowed.
598 # No file patterns are supported.
599 #
600 # The supported files are:
601 #
602 # Google Cloud Storage:
603 #
604 # storage.googleapis.com/{bucket}/{object}
605 # bucket.storage.googleapis.com/{object}
606 "A String",
607 ],
608 "type": "A String", # The type of Cloud Dataflow job.
609 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
610 # If this field is set, the service will ensure its uniqueness.
611 # The request to create a job will fail if the service has knowledge of a
612 # previously submitted job with the same client's ID and job name.
613 # The caller may use this field to ensure idempotence of job
614 # creation across retried attempts to create a job.
615 # By default, the field is empty and, in that case, the service ignores it.
616 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
617 # snapshot.
618 "stepsLocation": "A String", # The GCS location where the steps are stored.
619 "currentStateTime": "A String", # The timestamp associated with the current state.
620 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
621 # Flexible resource scheduling jobs are started with some delay after job
622 # creation, so start_time is unset before start and is updated when the
623 # job is started by the Cloud Dataflow service. For other jobs, start_time
624 # always equals to create_time and is immutable and set by the Cloud Dataflow
625 # service.
626 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
627 # Cloud Dataflow service.
628 "requestedState": "A String", # The job's requested state.
629 #
630 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
631 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
632 # also be used to directly set a job's requested state to
633 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
634 # job if it has not already reached a terminal state.
635 "name": "A String", # The user-specified Cloud Dataflow job name.
636 #
637 # Only one Job with a given name may exist in a project at any
638 # given time. If a caller attempts to create a Job with the same
639 # name as an already-existing Job, the attempt returns the
640 # existing Job.
641 #
642 # The name must match the regular expression
643 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
644 "steps": [ # Exactly one of step or steps_location should be specified.
645 #
646 # The top-level steps that constitute the entire job.
647 { # Defines a particular step within a Cloud Dataflow job.
648 #
649 # A job consists of multiple steps, each of which performs some
650 # specific operation as part of the overall job. Data is typically
651 # passed from one step to another as part of the job.
652 #
653 # Here's an example of a sequence of steps which together implement a
654 # Map-Reduce job:
655 #
656 # * Read a collection of data from some source, parsing the
657 # collection's elements.
658 #
659 # * Validate the elements.
660 #
661 # * Apply a user-defined function to map each element to some value
662 # and extract an element-specific key value.
663 #
664 # * Group elements with the same key into a single element with
665 # that key, transforming a multiply-keyed collection into a
666 # uniquely-keyed collection.
667 #
668 # * Write the elements out to some data sink.
669 #
670 # Note that the Cloud Dataflow service may be used to run many different
671 # types of jobs, not just Map-Reduce.
672 "kind": "A String", # The kind of step in the Cloud Dataflow job.
673 "properties": { # Named properties associated with the step. Each kind of
674 # predefined step has its own required set of properties.
675 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
676 "a_key": "", # Properties of the object.
677 },
678 "name": "A String", # The name that identifies the step. This must be unique for each
679 # step with respect to all other steps in the Cloud Dataflow job.
680 },
681 ],
682 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
683 # of the job it replaced.
684 #
685 # When sending a `CreateJobRequest`, you can update a job by specifying it
686 # here. The job named here is stopped, and its intermediate state is
687 # transferred to this job.
688 "currentState": "A String", # The current state of the job.
689 #
690 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
691 # specified.
692 #
693 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
694 # terminal state. After a job has reached a terminal state, no
695 # further state updates may be made.
696 #
697 # This field may be mutated by the Cloud Dataflow service;
698 # callers cannot mutate it.
699 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
700 # isn't contained in the submitted job.
701 "stages": { # A mapping from each stage to the information about that stage.
702 "a_key": { # Contains information about how a particular
703 # google.dataflow.v1beta3.Step will be executed.
704 "stepName": [ # The steps associated with the execution stage.
705 # Note that stages may have several steps, and that a given step
706 # might be run by more than one stage.
707 "A String",
708 ],
709 },
710 },
711 },
712 },
713 ],
714 }</pre>
715</div>
716
717<div class="method">
718 <code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>
719 <pre>Retrieves the next page of results.
720
721Args:
722 previous_request: The request for the previous page. (required)
723 previous_response: The response from the request for the previous page. (required)
724
725Returns:
726 A request object that you can call 'execute()' on to request the next
727 page. Returns None if there are no more items in the collection.
728 </pre>
729</div>
730
731<div class="method">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800732 <code class="details" id="create">create(projectId, body, location=None, x__xgafv=None, replaceJobId=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400733 <pre>Creates a Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000734
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700735To create a job, we recommend using `projects.locations.jobs.create` with a
736[regional endpoint]
737(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
738`projects.jobs.create` is not recommended, as your job will always start
739in `us-central1`.
740
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000741Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400742 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Nathaniel Manista4f877e52015-06-15 16:44:50 +0000743 body: object, The request body. (required)
744 The object takes the form of:
745
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400746{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700747 "labels": { # User-defined labels for this job.
748 #
749 # The labels map can contain no more than 64 entries. Entries of the labels
750 # map are UTF8 strings that comply with the following restrictions:
751 #
752 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
753 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
754 # * Both keys and values are additionally constrained to be <= 128 bytes in
755 # size.
756 "a_key": "A String",
757 },
758 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
759 # by the metadata values provided here. Populated for ListJobs and all GetJob
760 # views SUMMARY and higher.
761 # ListJob response and Job SUMMARY view.
762 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
763 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
764 "version": "A String", # The version of the SDK used to run the job.
765 "sdkSupportStatus": "A String", # The support status for this SDK version.
766 },
767 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
768 { # Metadata for a PubSub connector used by the job.
769 "topic": "A String", # Topic accessed in the connection.
770 "subscription": "A String", # Subscription used in the connection.
771 },
772 ],
773 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
774 { # Metadata for a Datastore connector used by the job.
775 "projectId": "A String", # ProjectId accessed in the connection.
776 "namespace": "A String", # Namespace used in the connection.
777 },
778 ],
779 "fileDetails": [ # Identification of a File source used in the Dataflow job.
780 { # Metadata for a File connector used by the job.
781 "filePattern": "A String", # File Pattern used to access files by the connector.
782 },
783 ],
784 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
785 { # Metadata for a Spanner connector used by the job.
786 "instanceId": "A String", # InstanceId accessed in the connection.
787 "projectId": "A String", # ProjectId accessed in the connection.
788 "databaseId": "A String", # DatabaseId accessed in the connection.
789 },
790 ],
791 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
792 { # Metadata for a BigTable connector used by the job.
793 "instanceId": "A String", # InstanceId accessed in the connection.
794 "projectId": "A String", # ProjectId accessed in the connection.
795 "tableId": "A String", # TableId accessed in the connection.
796 },
797 ],
798 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
799 { # Metadata for a BigQuery connector used by the job.
800 "projectId": "A String", # Project accessed in the connection.
801 "dataset": "A String", # Dataset accessed in the connection.
802 "table": "A String", # Table accessed in the connection.
803 "query": "A String", # Query used to access data in the connection.
804 },
805 ],
806 },
807 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
808 # A description of the user pipeline and stages through which it is executed.
809 # Created by Cloud Dataflow service. Only retrieved with
810 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
811 # form. This data is provided by the Dataflow service for ease of visualizing
812 # the pipeline and interpreting Dataflow provided metrics.
813 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
814 { # Description of the type, names/ids, and input/outputs for a transform.
815 "kind": "A String", # Type of transform.
816 "name": "A String", # User provided name for this transform instance.
817 "inputCollectionName": [ # User names for all collection inputs to this transform.
818 "A String",
819 ],
820 "displayData": [ # Transform-specific display data.
821 { # Data provided with a pipeline or transform to provide descriptive info.
822 "shortStrValue": "A String", # A possible additional shorter value to display.
823 # For example a java_class_name_value of com.mypackage.MyDoFn
824 # will be stored with MyDoFn as the short_str_value and
825 # com.mypackage.MyDoFn as the java_class_name value.
826 # short_str_value can be displayed and java_class_name_value
827 # will be displayed as a tooltip.
828 "durationValue": "A String", # Contains value if the data is of duration type.
829 "url": "A String", # An optional full URL.
830 "floatValue": 3.14, # Contains value if the data is of float type.
831 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
832 # language namespace (i.e. python module) which defines the display data.
833 # This allows a dax monitoring system to specially handle the data
834 # and perform custom rendering.
835 "javaClassValue": "A String", # Contains value if the data is of java class type.
836 "label": "A String", # An optional label to display in a dax UI for the element.
837 "boolValue": True or False, # Contains value if the data is of a boolean type.
838 "strValue": "A String", # Contains value if the data is of string type.
839 "key": "A String", # The key identifying the display data.
840 # This is intended to be used as a label for the display data
841 # when viewed in a dax monitoring system.
842 "int64Value": "A String", # Contains value if the data is of int64 type.
843 "timestampValue": "A String", # Contains value if the data is of timestamp type.
844 },
845 ],
846 "outputCollectionName": [ # User names for all collection outputs to this transform.
847 "A String",
848 ],
849 "id": "A String", # SDK generated id of this transform instance.
850 },
851 ],
852 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
853 { # Description of the composing transforms, names/ids, and input/outputs of a
854 # stage of execution. Some composing transforms and sources may have been
855 # generated by the Dataflow service during execution planning.
856 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
857 { # Description of an interstitial value between transforms in an execution
858 # stage.
859 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
860 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
861 # source is most closely associated.
862 "name": "A String", # Dataflow service generated name for this source.
863 },
864 ],
865 "kind": "A String", # Type of tranform this stage is executing.
866 "name": "A String", # Dataflow service generated name for this stage.
867 "outputSource": [ # Output sources for this stage.
868 { # Description of an input or output of an execution stage.
869 "userName": "A String", # Human-readable name for this source; may be user or system generated.
870 "sizeBytes": "A String", # Size of the source, if measurable.
871 "name": "A String", # Dataflow service generated name for this source.
872 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
873 # source is most closely associated.
874 },
875 ],
876 "inputSource": [ # Input sources for this stage.
877 { # Description of an input or output of an execution stage.
878 "userName": "A String", # Human-readable name for this source; may be user or system generated.
879 "sizeBytes": "A String", # Size of the source, if measurable.
880 "name": "A String", # Dataflow service generated name for this source.
881 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
882 # source is most closely associated.
883 },
884 ],
885 "componentTransform": [ # Transforms that comprise this execution stage.
886 { # Description of a transform executed as part of an execution stage.
887 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
888 "originalTransform": "A String", # User name for the original user transform with which this transform is
889 # most closely associated.
890 "name": "A String", # Dataflow service generated name for this source.
891 },
892 ],
893 "id": "A String", # Dataflow service generated id for this stage.
894 },
895 ],
896 "displayData": [ # Pipeline level display data.
897 { # Data provided with a pipeline or transform to provide descriptive info.
898 "shortStrValue": "A String", # A possible additional shorter value to display.
899 # For example a java_class_name_value of com.mypackage.MyDoFn
900 # will be stored with MyDoFn as the short_str_value and
901 # com.mypackage.MyDoFn as the java_class_name value.
902 # short_str_value can be displayed and java_class_name_value
903 # will be displayed as a tooltip.
904 "durationValue": "A String", # Contains value if the data is of duration type.
905 "url": "A String", # An optional full URL.
906 "floatValue": 3.14, # Contains value if the data is of float type.
907 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
908 # language namespace (i.e. python module) which defines the display data.
909 # This allows a dax monitoring system to specially handle the data
910 # and perform custom rendering.
911 "javaClassValue": "A String", # Contains value if the data is of java class type.
912 "label": "A String", # An optional label to display in a dax UI for the element.
913 "boolValue": True or False, # Contains value if the data is of a boolean type.
914 "strValue": "A String", # Contains value if the data is of string type.
915 "key": "A String", # The key identifying the display data.
916 # This is intended to be used as a label for the display data
917 # when viewed in a dax monitoring system.
918 "int64Value": "A String", # Contains value if the data is of int64 type.
919 "timestampValue": "A String", # Contains value if the data is of timestamp type.
920 },
921 ],
922 },
923 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
924 # callers cannot mutate it.
925 { # A message describing the state of a particular execution stage.
926 "executionStageName": "A String", # The name of the execution stage.
927 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
928 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
929 },
930 ],
931 "id": "A String", # The unique ID of this job.
932 #
933 # This field is set by the Cloud Dataflow service when the Job is
934 # created, and is immutable for the life of the job.
935 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
936 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
937 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
938 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
939 # corresponding name prefixes of the new job.
940 "a_key": "A String",
941 },
942 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
943 "version": { # A structure describing which components and their versions of the service
944 # are required in order to run the job.
945 "a_key": "", # Properties of the object.
946 },
947 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
948 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
949 # at rest, AKA a Customer Managed Encryption Key (CMEK).
950 #
951 # Format:
952 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
953 "internalExperiments": { # Experimental settings.
954 "a_key": "", # Properties of the object. Contains field @type with type URL.
955 },
956 "dataset": "A String", # The dataset for the current project where various workflow
957 # related tables are stored.
958 #
959 # The supported resource type is:
960 #
961 # Google BigQuery:
962 # bigquery.googleapis.com/{dataset}
963 "experiments": [ # The list of experiments to enable.
964 "A String",
965 ],
966 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
967 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
968 # options are passed through the service and are used to recreate the
969 # SDK pipeline options on the worker in a language agnostic and platform
970 # independent way.
971 "a_key": "", # Properties of the object.
972 },
973 "userAgent": { # A description of the process that generated the request.
974 "a_key": "", # Properties of the object.
975 },
976 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
977 # unspecified, the service will attempt to choose a reasonable
978 # default. This should be in the form of the API service name,
979 # e.g. "compute.googleapis.com".
980 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
981 # specified in order for the job to have workers.
982 { # Describes one particular pool of Cloud Dataflow workers to be
983 # instantiated by the Cloud Dataflow service in order to perform the
984 # computations required by a job. Note that a workflow job may use
985 # multiple pools, in order to match the various computational
986 # requirements of the various stages of the job.
987 "diskSourceImage": "A String", # Fully qualified source image for disks.
988 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
989 # using the standard Dataflow task runner. Users should ignore
990 # this field.
991 "workflowFileName": "A String", # The file to store the workflow in.
992 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
993 # will not be uploaded.
994 #
995 # The supported resource type is:
996 #
997 # Google Cloud Storage:
998 # storage.googleapis.com/{bucket}/{object}
999 # bucket.storage.googleapis.com/{object}
1000 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
1001 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1002 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1003 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1004 # "shuffle/v1beta1".
1005 "workerId": "A String", # The ID of the worker running this pipeline.
1006 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1007 #
1008 # When workers access Google Cloud APIs, they logically do so via
1009 # relative URLs. If this field is specified, it supplies the base
1010 # URL to use for resolving these relative URLs. The normative
1011 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1012 # Locators".
1013 #
1014 # If not specified, the default value is "http://www.googleapis.com/"
1015 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1016 # "dataflow/v1b3/projects".
1017 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1018 # storage.
1019 #
1020 # The supported resource type is:
1021 #
1022 # Google Cloud Storage:
1023 #
1024 # storage.googleapis.com/{bucket}/{object}
1025 # bucket.storage.googleapis.com/{object}
1026 },
1027 "vmId": "A String", # The ID string of the VM.
1028 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1029 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1030 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1031 # access the Cloud Dataflow API.
1032 "A String",
1033 ],
1034 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1035 # taskrunner; e.g. "root".
1036 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1037 #
1038 # When workers access Google Cloud APIs, they logically do so via
1039 # relative URLs. If this field is specified, it supplies the base
1040 # URL to use for resolving these relative URLs. The normative
1041 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1042 # Locators".
1043 #
1044 # If not specified, the default value is "http://www.googleapis.com/"
1045 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1046 # taskrunner; e.g. "wheel".
1047 "languageHint": "A String", # The suggested backend language.
1048 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1049 # console.
1050 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1051 "logDir": "A String", # The directory on the VM to store logs.
1052 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
1053 "harnessCommand": "A String", # The command to launch the worker harness.
1054 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1055 # temporary storage.
1056 #
1057 # The supported resource type is:
1058 #
1059 # Google Cloud Storage:
1060 # storage.googleapis.com/{bucket}/{object}
1061 # bucket.storage.googleapis.com/{object}
1062 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
1063 },
1064 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1065 # are supported.
1066 "packages": [ # Packages to be installed on workers.
1067 { # The packages that must be installed in order for a worker to run the
1068 # steps of the Cloud Dataflow job that will be assigned to its worker
1069 # pool.
1070 #
1071 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1072 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1073 # might use this to install jars containing the user's code and all of the
1074 # various dependencies (libraries, data files, etc.) required in order
1075 # for that code to run.
1076 "location": "A String", # The resource to read the package from. The supported resource type is:
1077 #
1078 # Google Cloud Storage:
1079 #
1080 # storage.googleapis.com/{bucket}
1081 # bucket.storage.googleapis.com/
1082 "name": "A String", # The name of the package.
1083 },
1084 ],
1085 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1086 # service will attempt to choose a reasonable default.
1087 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1088 # the service will use the network "default".
1089 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1090 # will attempt to choose a reasonable default.
1091 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1092 # attempt to choose a reasonable default.
1093 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1094 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1095 # `TEARDOWN_NEVER`.
1096 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1097 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1098 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1099 # down.
1100 #
1101 # If the workers are not torn down by the service, they will
1102 # continue to run and use Google Compute Engine VM resources in the
1103 # user's project until they are explicitly terminated by the user.
1104 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1105 # policy except for small, manually supervised test jobs.
1106 #
1107 # If unknown or unspecified, the service will attempt to choose a reasonable
1108 # default.
1109 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1110 # Compute Engine API.
1111 "ipConfiguration": "A String", # Configuration for VM IPs.
1112 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1113 # service will choose a number of threads (according to the number of cores
1114 # on the selected machine type for batch, or 1 by convention for streaming).
1115 "poolArgs": { # Extra arguments for this worker pool.
1116 "a_key": "", # Properties of the object. Contains field @type with type URL.
1117 },
1118 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1119 # execute the job. If zero or unspecified, the service will
1120 # attempt to choose a reasonable default.
1121 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1122 # harness, residing in Google Container Registry.
1123 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1124 # the form "regions/REGION/subnetworks/SUBNETWORK".
1125 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1126 { # Describes the data disk used by a workflow job.
1127 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1128 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1129 # attempt to choose a reasonable default.
1130 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1131 # must be a disk type appropriate to the project and zone in which
1132 # the workers will run. If unknown or unspecified, the service
1133 # will attempt to choose a reasonable default.
1134 #
1135 # For example, the standard persistent disk type is a resource name
1136 # typically ending in "pd-standard". If SSD persistent disks are
1137 # available, the resource name typically ends with "pd-ssd". The
1138 # actual valid values are defined the Google Compute Engine API,
1139 # not by the Cloud Dataflow API; consult the Google Compute Engine
1140 # documentation for more information about determining the set of
1141 # available disk types for a particular project and zone.
1142 #
1143 # Google Compute Engine Disk types are local to a particular
1144 # project in a particular zone, and so the resource name will
1145 # typically look something like this:
1146 #
1147 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
1148 },
1149 ],
1150 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1151 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1152 "algorithm": "A String", # The algorithm to use for autoscaling.
1153 },
1154 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1155 # select a default set of packages which are useful to worker
1156 # harnesses written in a particular language.
1157 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
1158 # attempt to choose a reasonable default.
1159 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1160 "a_key": "A String",
1161 },
1162 },
1163 ],
1164 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1165 # storage. The system will append the suffix "/temp-{JOBNAME} to
1166 # this resource prefix, where {JOBNAME} is the value of the
1167 # job_name field. The resulting bucket and object prefix is used
1168 # as the prefix of the resources used to store temporary data
1169 # needed during the job execution. NOTE: This will override the
1170 # value in taskrunner_settings.
1171 # The supported resource type is:
1172 #
1173 # Google Cloud Storage:
1174 #
1175 # storage.googleapis.com/{bucket}/{object}
1176 # bucket.storage.googleapis.com/{object}
1177 },
1178 "location": "A String", # The [regional endpoint]
1179 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1180 # contains this job.
1181 "tempFiles": [ # A set of files the system should be aware of that are used
1182 # for temporary storage. These temporary files will be
1183 # removed on job completion.
1184 # No duplicates are allowed.
1185 # No file patterns are supported.
1186 #
1187 # The supported files are:
1188 #
1189 # Google Cloud Storage:
1190 #
1191 # storage.googleapis.com/{bucket}/{object}
1192 # bucket.storage.googleapis.com/{object}
1193 "A String",
1194 ],
1195 "type": "A String", # The type of Cloud Dataflow job.
1196 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1197 # If this field is set, the service will ensure its uniqueness.
1198 # The request to create a job will fail if the service has knowledge of a
1199 # previously submitted job with the same client's ID and job name.
1200 # The caller may use this field to ensure idempotence of job
1201 # creation across retried attempts to create a job.
1202 # By default, the field is empty and, in that case, the service ignores it.
1203 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1204 # snapshot.
1205 "stepsLocation": "A String", # The GCS location where the steps are stored.
1206 "currentStateTime": "A String", # The timestamp associated with the current state.
1207 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1208 # Flexible resource scheduling jobs are started with some delay after job
1209 # creation, so start_time is unset before start and is updated when the
1210 # job is started by the Cloud Dataflow service. For other jobs, start_time
1211 # always equals to create_time and is immutable and set by the Cloud Dataflow
1212 # service.
1213 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1214 # Cloud Dataflow service.
1215 "requestedState": "A String", # The job's requested state.
1216 #
1217 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1218 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1219 # also be used to directly set a job's requested state to
1220 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1221 # job if it has not already reached a terminal state.
1222 "name": "A String", # The user-specified Cloud Dataflow job name.
1223 #
1224 # Only one Job with a given name may exist in a project at any
1225 # given time. If a caller attempts to create a Job with the same
1226 # name as an already-existing Job, the attempt returns the
1227 # existing Job.
1228 #
1229 # The name must match the regular expression
1230 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1231 "steps": [ # Exactly one of step or steps_location should be specified.
1232 #
1233 # The top-level steps that constitute the entire job.
1234 { # Defines a particular step within a Cloud Dataflow job.
1235 #
1236 # A job consists of multiple steps, each of which performs some
1237 # specific operation as part of the overall job. Data is typically
1238 # passed from one step to another as part of the job.
1239 #
1240 # Here's an example of a sequence of steps which together implement a
1241 # Map-Reduce job:
1242 #
1243 # * Read a collection of data from some source, parsing the
1244 # collection's elements.
1245 #
1246 # * Validate the elements.
1247 #
1248 # * Apply a user-defined function to map each element to some value
1249 # and extract an element-specific key value.
1250 #
1251 # * Group elements with the same key into a single element with
1252 # that key, transforming a multiply-keyed collection into a
1253 # uniquely-keyed collection.
1254 #
1255 # * Write the elements out to some data sink.
1256 #
1257 # Note that the Cloud Dataflow service may be used to run many different
1258 # types of jobs, not just Map-Reduce.
1259 "kind": "A String", # The kind of step in the Cloud Dataflow job.
1260 "properties": { # Named properties associated with the step. Each kind of
1261 # predefined step has its own required set of properties.
1262 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1263 "a_key": "", # Properties of the object.
1264 },
1265 "name": "A String", # The name that identifies the step. This must be unique for each
1266 # step with respect to all other steps in the Cloud Dataflow job.
1267 },
1268 ],
1269 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1270 # of the job it replaced.
1271 #
1272 # When sending a `CreateJobRequest`, you can update a job by specifying it
1273 # here. The job named here is stopped, and its intermediate state is
1274 # transferred to this job.
1275 "currentState": "A String", # The current state of the job.
1276 #
1277 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1278 # specified.
1279 #
1280 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1281 # terminal state. After a job has reached a terminal state, no
1282 # further state updates may be made.
1283 #
1284 # This field may be mutated by the Cloud Dataflow service;
1285 # callers cannot mutate it.
1286 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1287 # isn't contained in the submitted job.
1288 "stages": { # A mapping from each stage to the information about that stage.
1289 "a_key": { # Contains information about how a particular
1290 # google.dataflow.v1beta3.Step will be executed.
1291 "stepName": [ # The steps associated with the execution stage.
1292 # Note that stages may have several steps, and that a given step
1293 # might be run by more than one stage.
1294 "A String",
1295 ],
1296 },
1297 },
1298 },
1299}
1300
1301 location: string, The [regional endpoint]
1302(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1303contains this job.
1304 x__xgafv: string, V1 error format.
1305 Allowed values
1306 1 - v1 error format
1307 2 - v2 error format
1308 replaceJobId: string, Deprecated. This field is now in the Job message.
1309 view: string, The level of information requested in response.
1310
1311Returns:
1312 An object of the form:
1313
1314 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001315 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001316 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001317 # The labels map can contain no more than 64 entries. Entries of the labels
1318 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001319 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001320 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1321 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1322 # * Both keys and values are additionally constrained to be <= 128 bytes in
1323 # size.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07001324 "a_key": "A String",
1325 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001326 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1327 # by the metadata values provided here. Populated for ListJobs and all GetJob
1328 # views SUMMARY and higher.
1329 # ListJob response and Job SUMMARY view.
1330 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1331 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1332 "version": "A String", # The version of the SDK used to run the job.
1333 "sdkSupportStatus": "A String", # The support status for this SDK version.
1334 },
1335 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1336 { # Metadata for a PubSub connector used by the job.
1337 "topic": "A String", # Topic accessed in the connection.
1338 "subscription": "A String", # Subscription used in the connection.
1339 },
1340 ],
1341 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
1342 { # Metadata for a Datastore connector used by the job.
1343 "projectId": "A String", # ProjectId accessed in the connection.
1344 "namespace": "A String", # Namespace used in the connection.
1345 },
1346 ],
1347 "fileDetails": [ # Identification of a File source used in the Dataflow job.
1348 { # Metadata for a File connector used by the job.
1349 "filePattern": "A String", # File Pattern used to access files by the connector.
1350 },
1351 ],
1352 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
1353 { # Metadata for a Spanner connector used by the job.
1354 "instanceId": "A String", # InstanceId accessed in the connection.
1355 "projectId": "A String", # ProjectId accessed in the connection.
1356 "databaseId": "A String", # DatabaseId accessed in the connection.
1357 },
1358 ],
1359 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
1360 { # Metadata for a BigTable connector used by the job.
1361 "instanceId": "A String", # InstanceId accessed in the connection.
1362 "projectId": "A String", # ProjectId accessed in the connection.
1363 "tableId": "A String", # TableId accessed in the connection.
1364 },
1365 ],
1366 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
1367 { # Metadata for a BigQuery connector used by the job.
1368 "projectId": "A String", # Project accessed in the connection.
1369 "dataset": "A String", # Dataset accessed in the connection.
1370 "table": "A String", # Table accessed in the connection.
1371 "query": "A String", # Query used to access data in the connection.
1372 },
1373 ],
1374 },
1375 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1376 # A description of the user pipeline and stages through which it is executed.
1377 # Created by Cloud Dataflow service. Only retrieved with
1378 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1379 # form. This data is provided by the Dataflow service for ease of visualizing
1380 # the pipeline and interpreting Dataflow provided metrics.
1381 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
1382 { # Description of the type, names/ids, and input/outputs for a transform.
1383 "kind": "A String", # Type of transform.
1384 "name": "A String", # User provided name for this transform instance.
1385 "inputCollectionName": [ # User names for all collection inputs to this transform.
1386 "A String",
1387 ],
1388 "displayData": [ # Transform-specific display data.
1389 { # Data provided with a pipeline or transform to provide descriptive info.
1390 "shortStrValue": "A String", # A possible additional shorter value to display.
1391 # For example a java_class_name_value of com.mypackage.MyDoFn
1392 # will be stored with MyDoFn as the short_str_value and
1393 # com.mypackage.MyDoFn as the java_class_name value.
1394 # short_str_value can be displayed and java_class_name_value
1395 # will be displayed as a tooltip.
1396 "durationValue": "A String", # Contains value if the data is of duration type.
1397 "url": "A String", # An optional full URL.
1398 "floatValue": 3.14, # Contains value if the data is of float type.
1399 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1400 # language namespace (i.e. python module) which defines the display data.
1401 # This allows a dax monitoring system to specially handle the data
1402 # and perform custom rendering.
1403 "javaClassValue": "A String", # Contains value if the data is of java class type.
1404 "label": "A String", # An optional label to display in a dax UI for the element.
1405 "boolValue": True or False, # Contains value if the data is of a boolean type.
1406 "strValue": "A String", # Contains value if the data is of string type.
1407 "key": "A String", # The key identifying the display data.
1408 # This is intended to be used as a label for the display data
1409 # when viewed in a dax monitoring system.
1410 "int64Value": "A String", # Contains value if the data is of int64 type.
1411 "timestampValue": "A String", # Contains value if the data is of timestamp type.
1412 },
1413 ],
1414 "outputCollectionName": [ # User names for all collection outputs to this transform.
1415 "A String",
1416 ],
1417 "id": "A String", # SDK generated id of this transform instance.
1418 },
1419 ],
1420 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
1421 { # Description of the composing transforms, names/ids, and input/outputs of a
1422 # stage of execution. Some composing transforms and sources may have been
1423 # generated by the Dataflow service during execution planning.
1424 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
1425 { # Description of an interstitial value between transforms in an execution
1426 # stage.
1427 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1428 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1429 # source is most closely associated.
1430 "name": "A String", # Dataflow service generated name for this source.
1431 },
1432 ],
1433 "kind": "A String", # Type of tranform this stage is executing.
1434 "name": "A String", # Dataflow service generated name for this stage.
1435 "outputSource": [ # Output sources for this stage.
1436 { # Description of an input or output of an execution stage.
1437 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1438 "sizeBytes": "A String", # Size of the source, if measurable.
1439 "name": "A String", # Dataflow service generated name for this source.
1440 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1441 # source is most closely associated.
1442 },
1443 ],
1444 "inputSource": [ # Input sources for this stage.
1445 { # Description of an input or output of an execution stage.
1446 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1447 "sizeBytes": "A String", # Size of the source, if measurable.
1448 "name": "A String", # Dataflow service generated name for this source.
1449 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1450 # source is most closely associated.
1451 },
1452 ],
1453 "componentTransform": [ # Transforms that comprise this execution stage.
1454 { # Description of a transform executed as part of an execution stage.
1455 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1456 "originalTransform": "A String", # User name for the original user transform with which this transform is
1457 # most closely associated.
1458 "name": "A String", # Dataflow service generated name for this source.
1459 },
1460 ],
1461 "id": "A String", # Dataflow service generated id for this stage.
1462 },
1463 ],
1464 "displayData": [ # Pipeline level display data.
1465 { # Data provided with a pipeline or transform to provide descriptive info.
1466 "shortStrValue": "A String", # A possible additional shorter value to display.
1467 # For example a java_class_name_value of com.mypackage.MyDoFn
1468 # will be stored with MyDoFn as the short_str_value and
1469 # com.mypackage.MyDoFn as the java_class_name value.
1470 # short_str_value can be displayed and java_class_name_value
1471 # will be displayed as a tooltip.
1472 "durationValue": "A String", # Contains value if the data is of duration type.
1473 "url": "A String", # An optional full URL.
1474 "floatValue": 3.14, # Contains value if the data is of float type.
1475 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1476 # language namespace (i.e. python module) which defines the display data.
1477 # This allows a dax monitoring system to specially handle the data
1478 # and perform custom rendering.
1479 "javaClassValue": "A String", # Contains value if the data is of java class type.
1480 "label": "A String", # An optional label to display in a dax UI for the element.
1481 "boolValue": True or False, # Contains value if the data is of a boolean type.
1482 "strValue": "A String", # Contains value if the data is of string type.
1483 "key": "A String", # The key identifying the display data.
1484 # This is intended to be used as a label for the display data
1485 # when viewed in a dax monitoring system.
1486 "int64Value": "A String", # Contains value if the data is of int64 type.
1487 "timestampValue": "A String", # Contains value if the data is of timestamp type.
1488 },
1489 ],
1490 },
1491 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
1492 # callers cannot mutate it.
1493 { # A message describing the state of a particular execution stage.
1494 "executionStageName": "A String", # The name of the execution stage.
1495 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
1496 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
1497 },
1498 ],
1499 "id": "A String", # The unique ID of this job.
1500 #
1501 # This field is set by the Cloud Dataflow service when the Job is
1502 # created, and is immutable for the life of the job.
1503 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
1504 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1505 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001506 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
1507 # corresponding name prefixes of the new job.
Takashi Matsuo06694102015-09-11 13:55:40 -07001508 "a_key": "A String",
Nathaniel Manista4f877e52015-06-15 16:44:50 +00001509 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001510 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
1511 "version": { # A structure describing which components and their versions of the service
1512 # are required in order to run the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07001513 "a_key": "", # Properties of the object.
1514 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001515 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
1516 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
1517 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001518 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001519 # Format:
1520 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Takashi Matsuo06694102015-09-11 13:55:40 -07001521 "internalExperiments": { # Experimental settings.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07001522 "a_key": "", # Properties of the object. Contains field @type with type URL.
Takashi Matsuo06694102015-09-11 13:55:40 -07001523 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001524 "dataset": "A String", # The dataset for the current project where various workflow
1525 # related tables are stored.
1526 #
1527 # The supported resource type is:
1528 #
1529 # Google BigQuery:
1530 # bigquery.googleapis.com/{dataset}
Takashi Matsuo06694102015-09-11 13:55:40 -07001531 "experiments": [ # The list of experiments to enable.
1532 "A String",
1533 ],
Sai Cheemalapatiea3a5e12016-10-12 14:05:53 -07001534 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001535 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
1536 # options are passed through the service and are used to recreate the
1537 # SDK pipeline options on the worker in a language agnostic and platform
1538 # independent way.
Takashi Matsuo06694102015-09-11 13:55:40 -07001539 "a_key": "", # Properties of the object.
1540 },
1541 "userAgent": { # A description of the process that generated the request.
1542 "a_key": "", # Properties of the object.
1543 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001544 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
1545 # unspecified, the service will attempt to choose a reasonable
1546 # default. This should be in the form of the API service name,
1547 # e.g. "compute.googleapis.com".
1548 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
1549 # specified in order for the job to have workers.
1550 { # Describes one particular pool of Cloud Dataflow workers to be
1551 # instantiated by the Cloud Dataflow service in order to perform the
1552 # computations required by a job. Note that a workflow job may use
1553 # multiple pools, in order to match the various computational
1554 # requirements of the various stages of the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07001555 "diskSourceImage": "A String", # Fully qualified source image for disks.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001556 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1557 # using the standard Dataflow task runner. Users should ignore
1558 # this field.
1559 "workflowFileName": "A String", # The file to store the workflow in.
1560 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
1561 # will not be uploaded.
1562 #
1563 # The supported resource type is:
1564 #
1565 # Google Cloud Storage:
1566 # storage.googleapis.com/{bucket}/{object}
1567 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001568 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001569 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1570 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1571 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1572 # "shuffle/v1beta1".
1573 "workerId": "A String", # The ID of the worker running this pipeline.
1574 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1575 #
1576 # When workers access Google Cloud APIs, they logically do so via
1577 # relative URLs. If this field is specified, it supplies the base
1578 # URL to use for resolving these relative URLs. The normative
1579 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1580 # Locators".
1581 #
1582 # If not specified, the default value is "http://www.googleapis.com/"
1583 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1584 # "dataflow/v1b3/projects".
1585 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1586 # storage.
1587 #
1588 # The supported resource type is:
1589 #
1590 # Google Cloud Storage:
1591 #
1592 # storage.googleapis.com/{bucket}/{object}
1593 # bucket.storage.googleapis.com/{object}
1594 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001595 "vmId": "A String", # The ID string of the VM.
1596 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1597 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1598 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1599 # access the Cloud Dataflow API.
1600 "A String",
1601 ],
1602 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1603 # taskrunner; e.g. "root".
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001604 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1605 #
1606 # When workers access Google Cloud APIs, they logically do so via
1607 # relative URLs. If this field is specified, it supplies the base
1608 # URL to use for resolving these relative URLs. The normative
1609 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1610 # Locators".
1611 #
1612 # If not specified, the default value is "http://www.googleapis.com/"
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001613 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1614 # taskrunner; e.g. "wheel".
1615 "languageHint": "A String", # The suggested backend language.
1616 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1617 # console.
1618 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1619 "logDir": "A String", # The directory on the VM to store logs.
1620 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001621 "harnessCommand": "A String", # The command to launch the worker harness.
1622 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1623 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001624 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001625 # The supported resource type is:
1626 #
1627 # Google Cloud Storage:
1628 # storage.googleapis.com/{bucket}/{object}
1629 # bucket.storage.googleapis.com/{object}
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001630 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
Takashi Matsuo06694102015-09-11 13:55:40 -07001631 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001632 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1633 # are supported.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001634 "packages": [ # Packages to be installed on workers.
1635 { # The packages that must be installed in order for a worker to run the
1636 # steps of the Cloud Dataflow job that will be assigned to its worker
1637 # pool.
1638 #
1639 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1640 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1641 # might use this to install jars containing the user's code and all of the
1642 # various dependencies (libraries, data files, etc.) required in order
1643 # for that code to run.
1644 "location": "A String", # The resource to read the package from. The supported resource type is:
1645 #
1646 # Google Cloud Storage:
1647 #
1648 # storage.googleapis.com/{bucket}
1649 # bucket.storage.googleapis.com/
1650 "name": "A String", # The name of the package.
1651 },
1652 ],
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001653 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1654 # service will attempt to choose a reasonable default.
1655 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1656 # the service will use the network "default".
1657 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1658 # will attempt to choose a reasonable default.
1659 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1660 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001661 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1662 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1663 # `TEARDOWN_NEVER`.
1664 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1665 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1666 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1667 # down.
1668 #
1669 # If the workers are not torn down by the service, they will
1670 # continue to run and use Google Compute Engine VM resources in the
1671 # user's project until they are explicitly terminated by the user.
1672 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1673 # policy except for small, manually supervised test jobs.
1674 #
1675 # If unknown or unspecified, the service will attempt to choose a reasonable
1676 # default.
1677 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1678 # Compute Engine API.
1679 "ipConfiguration": "A String", # Configuration for VM IPs.
1680 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1681 # service will choose a number of threads (according to the number of cores
1682 # on the selected machine type for batch, or 1 by convention for streaming).
1683 "poolArgs": { # Extra arguments for this worker pool.
1684 "a_key": "", # Properties of the object. Contains field @type with type URL.
1685 },
1686 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1687 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001688 # attempt to choose a reasonable default.
1689 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1690 # harness, residing in Google Container Registry.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001691 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1692 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001693 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1694 { # Describes the data disk used by a workflow job.
1695 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1696 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1697 # attempt to choose a reasonable default.
1698 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1699 # must be a disk type appropriate to the project and zone in which
1700 # the workers will run. If unknown or unspecified, the service
1701 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001702 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001703 # For example, the standard persistent disk type is a resource name
1704 # typically ending in "pd-standard". If SSD persistent disks are
1705 # available, the resource name typically ends with "pd-ssd". The
1706 # actual valid values are defined the Google Compute Engine API,
1707 # not by the Cloud Dataflow API; consult the Google Compute Engine
1708 # documentation for more information about determining the set of
1709 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001710 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001711 # Google Compute Engine Disk types are local to a particular
1712 # project in a particular zone, and so the resource name will
1713 # typically look something like this:
1714 #
1715 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001716 },
1717 ],
1718 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1719 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1720 "algorithm": "A String", # The algorithm to use for autoscaling.
1721 },
1722 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1723 # select a default set of packages which are useful to worker
1724 # harnesses written in a particular language.
1725 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
1726 # attempt to choose a reasonable default.
1727 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1728 "a_key": "A String",
1729 },
Takashi Matsuo06694102015-09-11 13:55:40 -07001730 },
1731 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001732 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1733 # storage. The system will append the suffix "/temp-{JOBNAME} to
1734 # this resource prefix, where {JOBNAME} is the value of the
1735 # job_name field. The resulting bucket and object prefix is used
1736 # as the prefix of the resources used to store temporary data
1737 # needed during the job execution. NOTE: This will override the
1738 # value in taskrunner_settings.
1739 # The supported resource type is:
1740 #
1741 # Google Cloud Storage:
1742 #
1743 # storage.googleapis.com/{bucket}/{object}
1744 # bucket.storage.googleapis.com/{object}
1745 },
1746 "location": "A String", # The [regional endpoint]
1747 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1748 # contains this job.
1749 "tempFiles": [ # A set of files the system should be aware of that are used
1750 # for temporary storage. These temporary files will be
1751 # removed on job completion.
1752 # No duplicates are allowed.
1753 # No file patterns are supported.
1754 #
1755 # The supported files are:
1756 #
1757 # Google Cloud Storage:
1758 #
1759 # storage.googleapis.com/{bucket}/{object}
1760 # bucket.storage.googleapis.com/{object}
1761 "A String",
1762 ],
1763 "type": "A String", # The type of Cloud Dataflow job.
1764 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1765 # If this field is set, the service will ensure its uniqueness.
1766 # The request to create a job will fail if the service has knowledge of a
1767 # previously submitted job with the same client's ID and job name.
1768 # The caller may use this field to ensure idempotence of job
1769 # creation across retried attempts to create a job.
1770 # By default, the field is empty and, in that case, the service ignores it.
1771 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1772 # snapshot.
1773 "stepsLocation": "A String", # The GCS location where the steps are stored.
1774 "currentStateTime": "A String", # The timestamp associated with the current state.
1775 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1776 # Flexible resource scheduling jobs are started with some delay after job
1777 # creation, so start_time is unset before start and is updated when the
1778 # job is started by the Cloud Dataflow service. For other jobs, start_time
1779 # always equals to create_time and is immutable and set by the Cloud Dataflow
1780 # service.
1781 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1782 # Cloud Dataflow service.
1783 "requestedState": "A String", # The job's requested state.
1784 #
1785 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1786 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1787 # also be used to directly set a job's requested state to
1788 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1789 # job if it has not already reached a terminal state.
1790 "name": "A String", # The user-specified Cloud Dataflow job name.
1791 #
1792 # Only one Job with a given name may exist in a project at any
1793 # given time. If a caller attempts to create a Job with the same
1794 # name as an already-existing Job, the attempt returns the
1795 # existing Job.
1796 #
1797 # The name must match the regular expression
1798 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1799 "steps": [ # Exactly one of step or steps_location should be specified.
1800 #
1801 # The top-level steps that constitute the entire job.
1802 { # Defines a particular step within a Cloud Dataflow job.
1803 #
1804 # A job consists of multiple steps, each of which performs some
1805 # specific operation as part of the overall job. Data is typically
1806 # passed from one step to another as part of the job.
1807 #
1808 # Here's an example of a sequence of steps which together implement a
1809 # Map-Reduce job:
1810 #
1811 # * Read a collection of data from some source, parsing the
1812 # collection's elements.
1813 #
1814 # * Validate the elements.
1815 #
1816 # * Apply a user-defined function to map each element to some value
1817 # and extract an element-specific key value.
1818 #
1819 # * Group elements with the same key into a single element with
1820 # that key, transforming a multiply-keyed collection into a
1821 # uniquely-keyed collection.
1822 #
1823 # * Write the elements out to some data sink.
1824 #
1825 # Note that the Cloud Dataflow service may be used to run many different
1826 # types of jobs, not just Map-Reduce.
1827 "kind": "A String", # The kind of step in the Cloud Dataflow job.
1828 "properties": { # Named properties associated with the step. Each kind of
1829 # predefined step has its own required set of properties.
1830 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1831 "a_key": "", # Properties of the object.
1832 },
1833 "name": "A String", # The name that identifies the step. This must be unique for each
1834 # step with respect to all other steps in the Cloud Dataflow job.
1835 },
1836 ],
1837 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1838 # of the job it replaced.
1839 #
1840 # When sending a `CreateJobRequest`, you can update a job by specifying it
1841 # here. The job named here is stopped, and its intermediate state is
1842 # transferred to this job.
1843 "currentState": "A String", # The current state of the job.
1844 #
1845 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1846 # specified.
1847 #
1848 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1849 # terminal state. After a job has reached a terminal state, no
1850 # further state updates may be made.
1851 #
1852 # This field may be mutated by the Cloud Dataflow service;
1853 # callers cannot mutate it.
1854 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1855 # isn't contained in the submitted job.
1856 "stages": { # A mapping from each stage to the information about that stage.
1857 "a_key": { # Contains information about how a particular
1858 # google.dataflow.v1beta3.Step will be executed.
1859 "stepName": [ # The steps associated with the execution stage.
1860 # Note that stages may have several steps, and that a given step
1861 # might be run by more than one stage.
1862 "A String",
1863 ],
1864 },
1865 },
1866 },
1867 }</pre>
1868</div>
1869
1870<div class="method">
1871 <code class="details" id="get">get(projectId, jobId, location=None, x__xgafv=None, view=None)</code>
1872 <pre>Gets the state of the specified Cloud Dataflow job.
1873
1874To get the state of a job, we recommend using `projects.locations.jobs.get`
1875with a [regional endpoint]
1876(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1877`projects.jobs.get` is not recommended, as you can only get the state of
1878jobs that are running in `us-central1`.
1879
1880Args:
1881 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
1882 jobId: string, The job ID. (required)
1883 location: string, The [regional endpoint]
1884(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1885contains this job.
1886 x__xgafv: string, V1 error format.
1887 Allowed values
1888 1 - v1 error format
1889 2 - v2 error format
1890 view: string, The level of information requested in response.
1891
1892Returns:
1893 An object of the form:
1894
1895 { # Defines a job to be run by the Cloud Dataflow service.
1896 "labels": { # User-defined labels for this job.
1897 #
1898 # The labels map can contain no more than 64 entries. Entries of the labels
1899 # map are UTF8 strings that comply with the following restrictions:
1900 #
1901 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1902 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1903 # * Both keys and values are additionally constrained to be <= 128 bytes in
1904 # size.
1905 "a_key": "A String",
1906 },
1907 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1908 # by the metadata values provided here. Populated for ListJobs and all GetJob
1909 # views SUMMARY and higher.
1910 # ListJob response and Job SUMMARY view.
1911 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1912 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1913 "version": "A String", # The version of the SDK used to run the job.
1914 "sdkSupportStatus": "A String", # The support status for this SDK version.
1915 },
1916 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1917 { # Metadata for a PubSub connector used by the job.
1918 "topic": "A String", # Topic accessed in the connection.
1919 "subscription": "A String", # Subscription used in the connection.
1920 },
1921 ],
1922 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
1923 { # Metadata for a Datastore connector used by the job.
1924 "projectId": "A String", # ProjectId accessed in the connection.
1925 "namespace": "A String", # Namespace used in the connection.
1926 },
1927 ],
1928 "fileDetails": [ # Identification of a File source used in the Dataflow job.
1929 { # Metadata for a File connector used by the job.
1930 "filePattern": "A String", # File Pattern used to access files by the connector.
1931 },
1932 ],
1933 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
1934 { # Metadata for a Spanner connector used by the job.
1935 "instanceId": "A String", # InstanceId accessed in the connection.
1936 "projectId": "A String", # ProjectId accessed in the connection.
1937 "databaseId": "A String", # DatabaseId accessed in the connection.
1938 },
1939 ],
1940 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
1941 { # Metadata for a BigTable connector used by the job.
1942 "instanceId": "A String", # InstanceId accessed in the connection.
1943 "projectId": "A String", # ProjectId accessed in the connection.
1944 "tableId": "A String", # TableId accessed in the connection.
1945 },
1946 ],
1947 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
1948 { # Metadata for a BigQuery connector used by the job.
1949 "projectId": "A String", # Project accessed in the connection.
1950 "dataset": "A String", # Dataset accessed in the connection.
1951 "table": "A String", # Table accessed in the connection.
1952 "query": "A String", # Query used to access data in the connection.
1953 },
1954 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07001955 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001956 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1957 # A description of the user pipeline and stages through which it is executed.
1958 # Created by Cloud Dataflow service. Only retrieved with
1959 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1960 # form. This data is provided by the Dataflow service for ease of visualizing
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001961 # the pipeline and interpreting Dataflow provided metrics.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001962 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
1963 { # Description of the type, names/ids, and input/outputs for a transform.
1964 "kind": "A String", # Type of transform.
1965 "name": "A String", # User provided name for this transform instance.
1966 "inputCollectionName": [ # User names for all collection inputs to this transform.
1967 "A String",
1968 ],
1969 "displayData": [ # Transform-specific display data.
1970 { # Data provided with a pipeline or transform to provide descriptive info.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001971 "shortStrValue": "A String", # A possible additional shorter value to display.
1972 # For example a java_class_name_value of com.mypackage.MyDoFn
1973 # will be stored with MyDoFn as the short_str_value and
1974 # com.mypackage.MyDoFn as the java_class_name value.
1975 # short_str_value can be displayed and java_class_name_value
1976 # will be displayed as a tooltip.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001977 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001978 "url": "A String", # An optional full URL.
1979 "floatValue": 3.14, # Contains value if the data is of float type.
1980 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1981 # language namespace (i.e. python module) which defines the display data.
1982 # This allows a dax monitoring system to specially handle the data
1983 # and perform custom rendering.
1984 "javaClassValue": "A String", # Contains value if the data is of java class type.
1985 "label": "A String", # An optional label to display in a dax UI for the element.
1986 "boolValue": True or False, # Contains value if the data is of a boolean type.
1987 "strValue": "A String", # Contains value if the data is of string type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001988 "key": "A String", # The key identifying the display data.
1989 # This is intended to be used as a label for the display data
1990 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001991 "int64Value": "A String", # Contains value if the data is of int64 type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001992 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001993 },
1994 ],
1995 "outputCollectionName": [ # User names for all collection outputs to this transform.
1996 "A String",
1997 ],
1998 "id": "A String", # SDK generated id of this transform instance.
1999 },
2000 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002001 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2002 { # Description of the composing transforms, names/ids, and input/outputs of a
2003 # stage of execution. Some composing transforms and sources may have been
2004 # generated by the Dataflow service during execution planning.
2005 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2006 { # Description of an interstitial value between transforms in an execution
2007 # stage.
2008 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2009 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2010 # source is most closely associated.
2011 "name": "A String", # Dataflow service generated name for this source.
2012 },
2013 ],
2014 "kind": "A String", # Type of tranform this stage is executing.
2015 "name": "A String", # Dataflow service generated name for this stage.
2016 "outputSource": [ # Output sources for this stage.
2017 { # Description of an input or output of an execution stage.
2018 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2019 "sizeBytes": "A String", # Size of the source, if measurable.
2020 "name": "A String", # Dataflow service generated name for this source.
2021 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2022 # source is most closely associated.
2023 },
2024 ],
2025 "inputSource": [ # Input sources for this stage.
2026 { # Description of an input or output of an execution stage.
2027 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2028 "sizeBytes": "A String", # Size of the source, if measurable.
2029 "name": "A String", # Dataflow service generated name for this source.
2030 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2031 # source is most closely associated.
2032 },
2033 ],
2034 "componentTransform": [ # Transforms that comprise this execution stage.
2035 { # Description of a transform executed as part of an execution stage.
2036 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2037 "originalTransform": "A String", # User name for the original user transform with which this transform is
2038 # most closely associated.
2039 "name": "A String", # Dataflow service generated name for this source.
2040 },
2041 ],
2042 "id": "A String", # Dataflow service generated id for this stage.
2043 },
2044 ],
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002045 "displayData": [ # Pipeline level display data.
2046 { # Data provided with a pipeline or transform to provide descriptive info.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002047 "shortStrValue": "A String", # A possible additional shorter value to display.
2048 # For example a java_class_name_value of com.mypackage.MyDoFn
2049 # will be stored with MyDoFn as the short_str_value and
2050 # com.mypackage.MyDoFn as the java_class_name value.
2051 # short_str_value can be displayed and java_class_name_value
2052 # will be displayed as a tooltip.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002053 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002054 "url": "A String", # An optional full URL.
2055 "floatValue": 3.14, # Contains value if the data is of float type.
2056 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2057 # language namespace (i.e. python module) which defines the display data.
2058 # This allows a dax monitoring system to specially handle the data
2059 # and perform custom rendering.
2060 "javaClassValue": "A String", # Contains value if the data is of java class type.
2061 "label": "A String", # An optional label to display in a dax UI for the element.
2062 "boolValue": True or False, # Contains value if the data is of a boolean type.
2063 "strValue": "A String", # Contains value if the data is of string type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002064 "key": "A String", # The key identifying the display data.
2065 # This is intended to be used as a label for the display data
2066 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002067 "int64Value": "A String", # Contains value if the data is of int64 type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002068 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002069 },
2070 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002071 },
2072 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2073 # callers cannot mutate it.
2074 { # A message describing the state of a particular execution stage.
2075 "executionStageName": "A String", # The name of the execution stage.
2076 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2077 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
2078 },
2079 ],
2080 "id": "A String", # The unique ID of this job.
2081 #
2082 # This field is set by the Cloud Dataflow service when the Job is
2083 # created, and is immutable for the life of the job.
2084 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2085 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2086 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2087 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2088 # corresponding name prefixes of the new job.
2089 "a_key": "A String",
2090 },
2091 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2092 "version": { # A structure describing which components and their versions of the service
2093 # are required in order to run the job.
2094 "a_key": "", # Properties of the object.
2095 },
2096 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2097 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2098 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2099 #
2100 # Format:
2101 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2102 "internalExperiments": { # Experimental settings.
2103 "a_key": "", # Properties of the object. Contains field @type with type URL.
2104 },
2105 "dataset": "A String", # The dataset for the current project where various workflow
2106 # related tables are stored.
2107 #
2108 # The supported resource type is:
2109 #
2110 # Google BigQuery:
2111 # bigquery.googleapis.com/{dataset}
2112 "experiments": [ # The list of experiments to enable.
2113 "A String",
2114 ],
2115 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2116 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2117 # options are passed through the service and are used to recreate the
2118 # SDK pipeline options on the worker in a language agnostic and platform
2119 # independent way.
2120 "a_key": "", # Properties of the object.
2121 },
2122 "userAgent": { # A description of the process that generated the request.
2123 "a_key": "", # Properties of the object.
2124 },
2125 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
2126 # unspecified, the service will attempt to choose a reasonable
2127 # default. This should be in the form of the API service name,
2128 # e.g. "compute.googleapis.com".
2129 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2130 # specified in order for the job to have workers.
2131 { # Describes one particular pool of Cloud Dataflow workers to be
2132 # instantiated by the Cloud Dataflow service in order to perform the
2133 # computations required by a job. Note that a workflow job may use
2134 # multiple pools, in order to match the various computational
2135 # requirements of the various stages of the job.
2136 "diskSourceImage": "A String", # Fully qualified source image for disks.
2137 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2138 # using the standard Dataflow task runner. Users should ignore
2139 # this field.
2140 "workflowFileName": "A String", # The file to store the workflow in.
2141 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2142 # will not be uploaded.
2143 #
2144 # The supported resource type is:
2145 #
2146 # Google Cloud Storage:
2147 # storage.googleapis.com/{bucket}/{object}
2148 # bucket.storage.googleapis.com/{object}
2149 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
2150 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2151 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2152 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2153 # "shuffle/v1beta1".
2154 "workerId": "A String", # The ID of the worker running this pipeline.
2155 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
2156 #
2157 # When workers access Google Cloud APIs, they logically do so via
2158 # relative URLs. If this field is specified, it supplies the base
2159 # URL to use for resolving these relative URLs. The normative
2160 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2161 # Locators".
2162 #
2163 # If not specified, the default value is "http://www.googleapis.com/"
2164 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
2165 # "dataflow/v1b3/projects".
2166 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2167 # storage.
2168 #
2169 # The supported resource type is:
2170 #
2171 # Google Cloud Storage:
2172 #
2173 # storage.googleapis.com/{bucket}/{object}
2174 # bucket.storage.googleapis.com/{object}
2175 },
2176 "vmId": "A String", # The ID string of the VM.
2177 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2178 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2179 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2180 # access the Cloud Dataflow API.
2181 "A String",
2182 ],
2183 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2184 # taskrunner; e.g. "root".
2185 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
2186 #
2187 # When workers access Google Cloud APIs, they logically do so via
2188 # relative URLs. If this field is specified, it supplies the base
2189 # URL to use for resolving these relative URLs. The normative
2190 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2191 # Locators".
2192 #
2193 # If not specified, the default value is "http://www.googleapis.com/"
2194 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2195 # taskrunner; e.g. "wheel".
2196 "languageHint": "A String", # The suggested backend language.
2197 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2198 # console.
2199 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2200 "logDir": "A String", # The directory on the VM to store logs.
2201 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
2202 "harnessCommand": "A String", # The command to launch the worker harness.
2203 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
2204 # temporary storage.
2205 #
2206 # The supported resource type is:
2207 #
2208 # Google Cloud Storage:
2209 # storage.googleapis.com/{bucket}/{object}
2210 # bucket.storage.googleapis.com/{object}
2211 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2212 },
2213 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
2214 # are supported.
2215 "packages": [ # Packages to be installed on workers.
2216 { # The packages that must be installed in order for a worker to run the
2217 # steps of the Cloud Dataflow job that will be assigned to its worker
2218 # pool.
2219 #
2220 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2221 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2222 # might use this to install jars containing the user's code and all of the
2223 # various dependencies (libraries, data files, etc.) required in order
2224 # for that code to run.
2225 "location": "A String", # The resource to read the package from. The supported resource type is:
2226 #
2227 # Google Cloud Storage:
2228 #
2229 # storage.googleapis.com/{bucket}
2230 # bucket.storage.googleapis.com/
2231 "name": "A String", # The name of the package.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002232 },
2233 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002234 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2235 # service will attempt to choose a reasonable default.
2236 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2237 # the service will use the network "default".
2238 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2239 # will attempt to choose a reasonable default.
2240 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2241 # attempt to choose a reasonable default.
2242 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
2243 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2244 # `TEARDOWN_NEVER`.
2245 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2246 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2247 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2248 # down.
2249 #
2250 # If the workers are not torn down by the service, they will
2251 # continue to run and use Google Compute Engine VM resources in the
2252 # user's project until they are explicitly terminated by the user.
2253 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2254 # policy except for small, manually supervised test jobs.
2255 #
2256 # If unknown or unspecified, the service will attempt to choose a reasonable
2257 # default.
2258 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2259 # Compute Engine API.
2260 "ipConfiguration": "A String", # Configuration for VM IPs.
2261 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
2262 # service will choose a number of threads (according to the number of cores
2263 # on the selected machine type for batch, or 1 by convention for streaming).
2264 "poolArgs": { # Extra arguments for this worker pool.
2265 "a_key": "", # Properties of the object. Contains field @type with type URL.
2266 },
2267 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
2268 # execute the job. If zero or unspecified, the service will
2269 # attempt to choose a reasonable default.
2270 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2271 # harness, residing in Google Container Registry.
2272 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2273 # the form "regions/REGION/subnetworks/SUBNETWORK".
2274 "dataDisks": [ # Data disks that are used by a VM in this workflow.
2275 { # Describes the data disk used by a workflow job.
2276 "mountPoint": "A String", # Directory in a VM where disk is mounted.
2277 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
2278 # attempt to choose a reasonable default.
2279 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
2280 # must be a disk type appropriate to the project and zone in which
2281 # the workers will run. If unknown or unspecified, the service
2282 # will attempt to choose a reasonable default.
2283 #
2284 # For example, the standard persistent disk type is a resource name
2285 # typically ending in "pd-standard". If SSD persistent disks are
2286 # available, the resource name typically ends with "pd-ssd". The
2287 # actual valid values are defined the Google Compute Engine API,
2288 # not by the Cloud Dataflow API; consult the Google Compute Engine
2289 # documentation for more information about determining the set of
2290 # available disk types for a particular project and zone.
2291 #
2292 # Google Compute Engine Disk types are local to a particular
2293 # project in a particular zone, and so the resource name will
2294 # typically look something like this:
2295 #
2296 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002297 },
2298 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002299 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2300 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2301 "algorithm": "A String", # The algorithm to use for autoscaling.
2302 },
2303 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
2304 # select a default set of packages which are useful to worker
2305 # harnesses written in a particular language.
2306 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
2307 # attempt to choose a reasonable default.
2308 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2309 "a_key": "A String",
2310 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002311 },
2312 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002313 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2314 # storage. The system will append the suffix "/temp-{JOBNAME} to
2315 # this resource prefix, where {JOBNAME} is the value of the
2316 # job_name field. The resulting bucket and object prefix is used
2317 # as the prefix of the resources used to store temporary data
2318 # needed during the job execution. NOTE: This will override the
2319 # value in taskrunner_settings.
2320 # The supported resource type is:
2321 #
2322 # Google Cloud Storage:
2323 #
2324 # storage.googleapis.com/{bucket}/{object}
2325 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002326 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002327 "location": "A String", # The [regional endpoint]
2328 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2329 # contains this job.
2330 "tempFiles": [ # A set of files the system should be aware of that are used
2331 # for temporary storage. These temporary files will be
2332 # removed on job completion.
2333 # No duplicates are allowed.
2334 # No file patterns are supported.
2335 #
2336 # The supported files are:
2337 #
2338 # Google Cloud Storage:
2339 #
2340 # storage.googleapis.com/{bucket}/{object}
2341 # bucket.storage.googleapis.com/{object}
2342 "A String",
2343 ],
2344 "type": "A String", # The type of Cloud Dataflow job.
2345 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
2346 # If this field is set, the service will ensure its uniqueness.
2347 # The request to create a job will fail if the service has knowledge of a
2348 # previously submitted job with the same client's ID and job name.
2349 # The caller may use this field to ensure idempotence of job
2350 # creation across retried attempts to create a job.
2351 # By default, the field is empty and, in that case, the service ignores it.
2352 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
2353 # snapshot.
2354 "stepsLocation": "A String", # The GCS location where the steps are stored.
2355 "currentStateTime": "A String", # The timestamp associated with the current state.
2356 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
2357 # Flexible resource scheduling jobs are started with some delay after job
2358 # creation, so start_time is unset before start and is updated when the
2359 # job is started by the Cloud Dataflow service. For other jobs, start_time
2360 # always equals to create_time and is immutable and set by the Cloud Dataflow
2361 # service.
2362 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
2363 # Cloud Dataflow service.
2364 "requestedState": "A String", # The job's requested state.
2365 #
2366 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
2367 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
2368 # also be used to directly set a job's requested state to
2369 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
2370 # job if it has not already reached a terminal state.
2371 "name": "A String", # The user-specified Cloud Dataflow job name.
2372 #
2373 # Only one Job with a given name may exist in a project at any
2374 # given time. If a caller attempts to create a Job with the same
2375 # name as an already-existing Job, the attempt returns the
2376 # existing Job.
2377 #
2378 # The name must match the regular expression
2379 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
2380 "steps": [ # Exactly one of step or steps_location should be specified.
2381 #
2382 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002383 { # Defines a particular step within a Cloud Dataflow job.
2384 #
2385 # A job consists of multiple steps, each of which performs some
2386 # specific operation as part of the overall job. Data is typically
2387 # passed from one step to another as part of the job.
2388 #
2389 # Here's an example of a sequence of steps which together implement a
2390 # Map-Reduce job:
2391 #
2392 # * Read a collection of data from some source, parsing the
2393 # collection's elements.
2394 #
2395 # * Validate the elements.
2396 #
2397 # * Apply a user-defined function to map each element to some value
2398 # and extract an element-specific key value.
2399 #
2400 # * Group elements with the same key into a single element with
2401 # that key, transforming a multiply-keyed collection into a
2402 # uniquely-keyed collection.
2403 #
2404 # * Write the elements out to some data sink.
2405 #
2406 # Note that the Cloud Dataflow service may be used to run many different
2407 # types of jobs, not just Map-Reduce.
2408 "kind": "A String", # The kind of step in the Cloud Dataflow job.
2409 "properties": { # Named properties associated with the step. Each kind of
2410 # predefined step has its own required set of properties.
2411 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Takashi Matsuo06694102015-09-11 13:55:40 -07002412 "a_key": "", # Properties of the object.
2413 },
Thomas Coffee2f245372017-03-27 10:39:26 -07002414 "name": "A String", # The name that identifies the step. This must be unique for each
2415 # step with respect to all other steps in the Cloud Dataflow job.
Takashi Matsuo06694102015-09-11 13:55:40 -07002416 },
2417 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07002418 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
2419 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002420 #
Thomas Coffee2f245372017-03-27 10:39:26 -07002421 # When sending a `CreateJobRequest`, you can update a job by specifying it
2422 # here. The job named here is stopped, and its intermediate state is
2423 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002424 "currentState": "A String", # The current state of the job.
2425 #
2426 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
2427 # specified.
2428 #
2429 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
2430 # terminal state. After a job has reached a terminal state, no
2431 # further state updates may be made.
2432 #
2433 # This field may be mutated by the Cloud Dataflow service;
2434 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002435 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
2436 # isn't contained in the submitted job.
Takashi Matsuo06694102015-09-11 13:55:40 -07002437 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002438 "a_key": { # Contains information about how a particular
2439 # google.dataflow.v1beta3.Step will be executed.
2440 "stepName": [ # The steps associated with the execution stage.
2441 # Note that stages may have several steps, and that a given step
2442 # might be run by more than one stage.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002443 "A String",
2444 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002445 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002446 },
2447 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002448 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002449</div>
2450
2451<div class="method">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002452 <code class="details" id="getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</code>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002453 <pre>Request the job status.
2454
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002455To request the status of a job, we recommend using
2456`projects.locations.jobs.getMetrics` with a [regional endpoint]
2457(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
2458`projects.jobs.getMetrics` is not recommended, as you can only request the
2459status of jobs that are running in `us-central1`.
2460
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002461Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002462 projectId: string, A project id. (required)
2463 jobId: string, The job to get messages for. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002464 startTime: string, Return only metric data that has changed since this time.
2465Default is to return all information about all metrics for the job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002466 location: string, The [regional endpoint]
2467(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2468contains the job specified by job_id.
Takashi Matsuo06694102015-09-11 13:55:40 -07002469 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002470 Allowed values
2471 1 - v1 error format
2472 2 - v2 error format
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002473
2474Returns:
2475 An object of the form:
2476
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002477 { # JobMetrics contains a collection of metrics describing the detailed progress
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002478 # of a Dataflow job. Metrics correspond to user-defined and system-defined
2479 # metrics in the job.
2480 #
2481 # This resource captures only the most recent values of each metric;
2482 # time-series data can be queried for them (under the same metric names)
2483 # from Cloud Monitoring.
Takashi Matsuo06694102015-09-11 13:55:40 -07002484 "metrics": [ # All metrics for this job.
2485 { # Describes the state of a metric.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002486 "meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
2487 # This holds the count of the aggregated values and is used in combination
2488 # with mean_sum above to obtain the actual mean aggregate value.
2489 # The only possible value type is Long.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002490 "kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are
2491 # "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".
2492 # The specified aggregation kind is case-insensitive.
2493 #
2494 # If omitted, this is not an aggregated value but instead
2495 # a single metric sample value.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002496 "set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only
2497 # possible value type is a list of Values whose type can be Long, Double,
2498 # or String, according to the metric's type. All Values in the list must
2499 # be of the same type.
2500 "name": { # Identifies a metric, by describing the source which generated the # Name of the metric.
2501 # metric.
2502 "origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;
2503 # will be "dataflow" for metrics defined by the Dataflow service or SDK.
Takashi Matsuo06694102015-09-11 13:55:40 -07002504 "name": "A String", # Worker-defined metric name.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002505 "context": { # Zero or more labeled fields which identify the part of the job this
2506 # metric is associated with, such as the name of a step or collection.
2507 #
2508 # For example, built-in counters associated with steps will have
2509 # context['step'] = <step-name>. Counters associated with PCollections
2510 # in the SDK will have context['pcollection'] = <pcollection-name>.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002511 "a_key": "A String",
2512 },
2513 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002514 "meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
2515 # This holds the sum of the aggregated values and is used in combination
2516 # with mean_count below to obtain the actual mean aggregate value.
2517 # The only possible value types are Long and Double.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002518 "cumulative": True or False, # True if this metric is reported as the total cumulative aggregate
2519 # value accumulated since the worker started working on this WorkItem.
2520 # By default this is false, indicating that this metric is reported
2521 # as a delta that is not associated with any WorkItem.
2522 "updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are
2523 # reporting work progress; it will be filled in responses from the
2524 # metrics API.
2525 "scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",
2526 # "And", and "Or". The possible value types are Long, Double, and Boolean.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002527 "internal": "", # Worker-computed aggregate value for internal use by the Dataflow
2528 # service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002529 "gauge": "", # A struct value describing properties of a Gauge.
2530 # Metrics of gauge type show the value of a metric across time, and is
2531 # aggregated based on the newest value.
2532 "distribution": "", # A struct value describing properties of a distribution of numeric values.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002533 },
2534 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07002535 "metricTime": "A String", # Timestamp as of which metric values are current.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002536 }</pre>
2537</div>
2538
2539<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002540 <code class="details" id="list">list(projectId, pageSize=None, pageToken=None, x__xgafv=None, location=None, filter=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002541 <pre>List the jobs of a project.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002542
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002543To list the jobs of a project in a region, we recommend using
2544`projects.locations.jobs.get` with a [regional endpoint]
2545(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
2546list the all jobs across all regions, use `projects.jobs.aggregated`. Using
2547`projects.jobs.list` is not recommended, as you can only get the list of
2548jobs that are running in `us-central1`.
2549
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002550Args:
Takashi Matsuo06694102015-09-11 13:55:40 -07002551 projectId: string, The project which owns the jobs. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002552 pageSize: integer, If there are many jobs, limit response to at most this many.
2553The actual number of jobs returned will be the lesser of max_responses
2554and an unspecified server-defined limit.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002555 pageToken: string, Set this to the 'next_page_token' field of a previous response
2556to request additional results in a long list.
Takashi Matsuo06694102015-09-11 13:55:40 -07002557 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002558 Allowed values
2559 1 - v1 error format
2560 2 - v2 error format
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002561 location: string, The [regional endpoint]
2562(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2563contains this job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002564 filter: string, The kind of filter to use.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002565 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002566
2567Returns:
2568 An object of the form:
2569
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002570 { # Response to a request to list Cloud Dataflow jobs. This may be a partial
2571 # response, depending on the page size in the ListJobsRequest.
Takashi Matsuo06694102015-09-11 13:55:40 -07002572 "nextPageToken": "A String", # Set if there may be more results than fit in this response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002573 "failedLocation": [ # Zero or more messages describing the [regional endpoints]
2574 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2575 # failed to respond.
2576 { # Indicates which [regional endpoint]
2577 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
2578 # to respond to a request for data.
2579 "name": "A String", # The name of the [regional endpoint]
2580 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2581 # failed to respond.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002582 },
2583 ],
Takashi Matsuo06694102015-09-11 13:55:40 -07002584 "jobs": [ # A subset of the requested job information.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002585 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002586 "labels": { # User-defined labels for this job.
2587 #
2588 # The labels map can contain no more than 64 entries. Entries of the labels
2589 # map are UTF8 strings that comply with the following restrictions:
2590 #
2591 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2592 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
2593 # * Both keys and values are additionally constrained to be <= 128 bytes in
2594 # size.
2595 "a_key": "A String",
2596 },
2597 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2598 # by the metadata values provided here. Populated for ListJobs and all GetJob
2599 # views SUMMARY and higher.
2600 # ListJob response and Job SUMMARY view.
2601 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
2602 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
2603 "version": "A String", # The version of the SDK used to run the job.
2604 "sdkSupportStatus": "A String", # The support status for this SDK version.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07002605 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002606 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
2607 { # Metadata for a PubSub connector used by the job.
2608 "topic": "A String", # Topic accessed in the connection.
2609 "subscription": "A String", # Subscription used in the connection.
2610 },
2611 ],
2612 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2613 { # Metadata for a Datastore connector used by the job.
2614 "projectId": "A String", # ProjectId accessed in the connection.
2615 "namespace": "A String", # Namespace used in the connection.
2616 },
2617 ],
2618 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2619 { # Metadata for a File connector used by the job.
2620 "filePattern": "A String", # File Pattern used to access files by the connector.
2621 },
2622 ],
2623 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2624 { # Metadata for a Spanner connector used by the job.
2625 "instanceId": "A String", # InstanceId accessed in the connection.
2626 "projectId": "A String", # ProjectId accessed in the connection.
2627 "databaseId": "A String", # DatabaseId accessed in the connection.
2628 },
2629 ],
2630 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2631 { # Metadata for a BigTable connector used by the job.
2632 "instanceId": "A String", # InstanceId accessed in the connection.
2633 "projectId": "A String", # ProjectId accessed in the connection.
2634 "tableId": "A String", # TableId accessed in the connection.
2635 },
2636 ],
2637 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2638 { # Metadata for a BigQuery connector used by the job.
2639 "projectId": "A String", # Project accessed in the connection.
2640 "dataset": "A String", # Dataset accessed in the connection.
2641 "table": "A String", # Table accessed in the connection.
2642 "query": "A String", # Query used to access data in the connection.
2643 },
2644 ],
2645 },
2646 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2647 # A description of the user pipeline and stages through which it is executed.
2648 # Created by Cloud Dataflow service. Only retrieved with
2649 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2650 # form. This data is provided by the Dataflow service for ease of visualizing
2651 # the pipeline and interpreting Dataflow provided metrics.
2652 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2653 { # Description of the type, names/ids, and input/outputs for a transform.
2654 "kind": "A String", # Type of transform.
2655 "name": "A String", # User provided name for this transform instance.
2656 "inputCollectionName": [ # User names for all collection inputs to this transform.
2657 "A String",
2658 ],
2659 "displayData": [ # Transform-specific display data.
2660 { # Data provided with a pipeline or transform to provide descriptive info.
2661 "shortStrValue": "A String", # A possible additional shorter value to display.
2662 # For example a java_class_name_value of com.mypackage.MyDoFn
2663 # will be stored with MyDoFn as the short_str_value and
2664 # com.mypackage.MyDoFn as the java_class_name value.
2665 # short_str_value can be displayed and java_class_name_value
2666 # will be displayed as a tooltip.
2667 "durationValue": "A String", # Contains value if the data is of duration type.
2668 "url": "A String", # An optional full URL.
2669 "floatValue": 3.14, # Contains value if the data is of float type.
2670 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2671 # language namespace (i.e. python module) which defines the display data.
2672 # This allows a dax monitoring system to specially handle the data
2673 # and perform custom rendering.
2674 "javaClassValue": "A String", # Contains value if the data is of java class type.
2675 "label": "A String", # An optional label to display in a dax UI for the element.
2676 "boolValue": True or False, # Contains value if the data is of a boolean type.
2677 "strValue": "A String", # Contains value if the data is of string type.
2678 "key": "A String", # The key identifying the display data.
2679 # This is intended to be used as a label for the display data
2680 # when viewed in a dax monitoring system.
2681 "int64Value": "A String", # Contains value if the data is of int64 type.
2682 "timestampValue": "A String", # Contains value if the data is of timestamp type.
2683 },
2684 ],
2685 "outputCollectionName": [ # User names for all collection outputs to this transform.
2686 "A String",
2687 ],
2688 "id": "A String", # SDK generated id of this transform instance.
2689 },
2690 ],
2691 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2692 { # Description of the composing transforms, names/ids, and input/outputs of a
2693 # stage of execution. Some composing transforms and sources may have been
2694 # generated by the Dataflow service during execution planning.
2695 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2696 { # Description of an interstitial value between transforms in an execution
2697 # stage.
2698 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2699 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2700 # source is most closely associated.
2701 "name": "A String", # Dataflow service generated name for this source.
2702 },
2703 ],
2704 "kind": "A String", # Type of tranform this stage is executing.
2705 "name": "A String", # Dataflow service generated name for this stage.
2706 "outputSource": [ # Output sources for this stage.
2707 { # Description of an input or output of an execution stage.
2708 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2709 "sizeBytes": "A String", # Size of the source, if measurable.
2710 "name": "A String", # Dataflow service generated name for this source.
2711 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2712 # source is most closely associated.
2713 },
2714 ],
2715 "inputSource": [ # Input sources for this stage.
2716 { # Description of an input or output of an execution stage.
2717 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2718 "sizeBytes": "A String", # Size of the source, if measurable.
2719 "name": "A String", # Dataflow service generated name for this source.
2720 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2721 # source is most closely associated.
2722 },
2723 ],
2724 "componentTransform": [ # Transforms that comprise this execution stage.
2725 { # Description of a transform executed as part of an execution stage.
2726 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2727 "originalTransform": "A String", # User name for the original user transform with which this transform is
2728 # most closely associated.
2729 "name": "A String", # Dataflow service generated name for this source.
2730 },
2731 ],
2732 "id": "A String", # Dataflow service generated id for this stage.
2733 },
2734 ],
2735 "displayData": [ # Pipeline level display data.
2736 { # Data provided with a pipeline or transform to provide descriptive info.
2737 "shortStrValue": "A String", # A possible additional shorter value to display.
2738 # For example a java_class_name_value of com.mypackage.MyDoFn
2739 # will be stored with MyDoFn as the short_str_value and
2740 # com.mypackage.MyDoFn as the java_class_name value.
2741 # short_str_value can be displayed and java_class_name_value
2742 # will be displayed as a tooltip.
2743 "durationValue": "A String", # Contains value if the data is of duration type.
2744 "url": "A String", # An optional full URL.
2745 "floatValue": 3.14, # Contains value if the data is of float type.
2746 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2747 # language namespace (i.e. python module) which defines the display data.
2748 # This allows a dax monitoring system to specially handle the data
2749 # and perform custom rendering.
2750 "javaClassValue": "A String", # Contains value if the data is of java class type.
2751 "label": "A String", # An optional label to display in a dax UI for the element.
2752 "boolValue": True or False, # Contains value if the data is of a boolean type.
2753 "strValue": "A String", # Contains value if the data is of string type.
2754 "key": "A String", # The key identifying the display data.
2755 # This is intended to be used as a label for the display data
2756 # when viewed in a dax monitoring system.
2757 "int64Value": "A String", # Contains value if the data is of int64 type.
2758 "timestampValue": "A String", # Contains value if the data is of timestamp type.
2759 },
2760 ],
2761 },
2762 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2763 # callers cannot mutate it.
2764 { # A message describing the state of a particular execution stage.
2765 "executionStageName": "A String", # The name of the execution stage.
2766 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2767 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00002768 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002769 ],
2770 "id": "A String", # The unique ID of this job.
2771 #
2772 # This field is set by the Cloud Dataflow service when the Job is
2773 # created, and is immutable for the life of the job.
2774 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2775 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2776 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2777 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2778 # corresponding name prefixes of the new job.
2779 "a_key": "A String",
2780 },
2781 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2782 "version": { # A structure describing which components and their versions of the service
2783 # are required in order to run the job.
2784 "a_key": "", # Properties of the object.
2785 },
2786 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2787 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2788 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002789 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002790 # Format:
2791 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2792 "internalExperiments": { # Experimental settings.
2793 "a_key": "", # Properties of the object. Contains field @type with type URL.
2794 },
2795 "dataset": "A String", # The dataset for the current project where various workflow
2796 # related tables are stored.
2797 #
2798 # The supported resource type is:
2799 #
2800 # Google BigQuery:
2801 # bigquery.googleapis.com/{dataset}
2802 "experiments": [ # The list of experiments to enable.
2803 "A String",
2804 ],
2805 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2806 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2807 # options are passed through the service and are used to recreate the
2808 # SDK pipeline options on the worker in a language agnostic and platform
2809 # independent way.
2810 "a_key": "", # Properties of the object.
2811 },
2812 "userAgent": { # A description of the process that generated the request.
2813 "a_key": "", # Properties of the object.
2814 },
2815 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
2816 # unspecified, the service will attempt to choose a reasonable
2817 # default. This should be in the form of the API service name,
2818 # e.g. "compute.googleapis.com".
2819 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2820 # specified in order for the job to have workers.
2821 { # Describes one particular pool of Cloud Dataflow workers to be
2822 # instantiated by the Cloud Dataflow service in order to perform the
2823 # computations required by a job. Note that a workflow job may use
2824 # multiple pools, in order to match the various computational
2825 # requirements of the various stages of the job.
2826 "diskSourceImage": "A String", # Fully qualified source image for disks.
2827 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2828 # using the standard Dataflow task runner. Users should ignore
2829 # this field.
2830 "workflowFileName": "A String", # The file to store the workflow in.
2831 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2832 # will not be uploaded.
2833 #
2834 # The supported resource type is:
2835 #
2836 # Google Cloud Storage:
2837 # storage.googleapis.com/{bucket}/{object}
2838 # bucket.storage.googleapis.com/{object}
2839 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
2840 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2841 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2842 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2843 # "shuffle/v1beta1".
2844 "workerId": "A String", # The ID of the worker running this pipeline.
2845 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002846 #
2847 # When workers access Google Cloud APIs, they logically do so via
2848 # relative URLs. If this field is specified, it supplies the base
2849 # URL to use for resolving these relative URLs. The normative
2850 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2851 # Locators".
2852 #
2853 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002854 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
2855 # "dataflow/v1b3/projects".
2856 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2857 # storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002858 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07002859 # The supported resource type is:
2860 #
2861 # Google Cloud Storage:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002862 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07002863 # storage.googleapis.com/{bucket}/{object}
2864 # bucket.storage.googleapis.com/{object}
Takashi Matsuo06694102015-09-11 13:55:40 -07002865 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002866 "vmId": "A String", # The ID string of the VM.
2867 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2868 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2869 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2870 # access the Cloud Dataflow API.
2871 "A String",
Takashi Matsuo06694102015-09-11 13:55:40 -07002872 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002873 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2874 # taskrunner; e.g. "root".
2875 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002876 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002877 # When workers access Google Cloud APIs, they logically do so via
2878 # relative URLs. If this field is specified, it supplies the base
2879 # URL to use for resolving these relative URLs. The normative
2880 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2881 # Locators".
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002882 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002883 # If not specified, the default value is "http://www.googleapis.com/"
2884 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2885 # taskrunner; e.g. "wheel".
2886 "languageHint": "A String", # The suggested backend language.
2887 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2888 # console.
2889 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2890 "logDir": "A String", # The directory on the VM to store logs.
2891 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
2892 "harnessCommand": "A String", # The command to launch the worker harness.
2893 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
2894 # temporary storage.
2895 #
2896 # The supported resource type is:
2897 #
2898 # Google Cloud Storage:
2899 # storage.googleapis.com/{bucket}/{object}
2900 # bucket.storage.googleapis.com/{object}
2901 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2902 },
2903 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
2904 # are supported.
2905 "packages": [ # Packages to be installed on workers.
2906 { # The packages that must be installed in order for a worker to run the
2907 # steps of the Cloud Dataflow job that will be assigned to its worker
2908 # pool.
2909 #
2910 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2911 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2912 # might use this to install jars containing the user's code and all of the
2913 # various dependencies (libraries, data files, etc.) required in order
2914 # for that code to run.
2915 "location": "A String", # The resource to read the package from. The supported resource type is:
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002916 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002917 # Google Cloud Storage:
2918 #
2919 # storage.googleapis.com/{bucket}
2920 # bucket.storage.googleapis.com/
2921 "name": "A String", # The name of the package.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002922 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002923 ],
2924 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2925 # service will attempt to choose a reasonable default.
2926 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2927 # the service will use the network "default".
2928 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2929 # will attempt to choose a reasonable default.
2930 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2931 # attempt to choose a reasonable default.
2932 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
2933 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2934 # `TEARDOWN_NEVER`.
2935 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2936 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2937 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2938 # down.
2939 #
2940 # If the workers are not torn down by the service, they will
2941 # continue to run and use Google Compute Engine VM resources in the
2942 # user's project until they are explicitly terminated by the user.
2943 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2944 # policy except for small, manually supervised test jobs.
2945 #
2946 # If unknown or unspecified, the service will attempt to choose a reasonable
2947 # default.
2948 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2949 # Compute Engine API.
2950 "ipConfiguration": "A String", # Configuration for VM IPs.
2951 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
2952 # service will choose a number of threads (according to the number of cores
2953 # on the selected machine type for batch, or 1 by convention for streaming).
2954 "poolArgs": { # Extra arguments for this worker pool.
2955 "a_key": "", # Properties of the object. Contains field @type with type URL.
2956 },
2957 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
2958 # execute the job. If zero or unspecified, the service will
2959 # attempt to choose a reasonable default.
2960 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2961 # harness, residing in Google Container Registry.
2962 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2963 # the form "regions/REGION/subnetworks/SUBNETWORK".
2964 "dataDisks": [ # Data disks that are used by a VM in this workflow.
2965 { # Describes the data disk used by a workflow job.
2966 "mountPoint": "A String", # Directory in a VM where disk is mounted.
2967 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
2968 # attempt to choose a reasonable default.
2969 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
2970 # must be a disk type appropriate to the project and zone in which
2971 # the workers will run. If unknown or unspecified, the service
2972 # will attempt to choose a reasonable default.
2973 #
2974 # For example, the standard persistent disk type is a resource name
2975 # typically ending in "pd-standard". If SSD persistent disks are
2976 # available, the resource name typically ends with "pd-ssd". The
2977 # actual valid values are defined the Google Compute Engine API,
2978 # not by the Cloud Dataflow API; consult the Google Compute Engine
2979 # documentation for more information about determining the set of
2980 # available disk types for a particular project and zone.
2981 #
2982 # Google Compute Engine Disk types are local to a particular
2983 # project in a particular zone, and so the resource name will
2984 # typically look something like this:
2985 #
2986 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002987 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002988 ],
2989 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2990 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2991 "algorithm": "A String", # The algorithm to use for autoscaling.
Takashi Matsuo06694102015-09-11 13:55:40 -07002992 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002993 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
2994 # select a default set of packages which are useful to worker
2995 # harnesses written in a particular language.
2996 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
2997 # attempt to choose a reasonable default.
2998 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2999 "a_key": "A String",
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003000 },
Takashi Matsuo06694102015-09-11 13:55:40 -07003001 },
3002 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003003 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3004 # storage. The system will append the suffix "/temp-{JOBNAME} to
3005 # this resource prefix, where {JOBNAME} is the value of the
3006 # job_name field. The resulting bucket and object prefix is used
3007 # as the prefix of the resources used to store temporary data
3008 # needed during the job execution. NOTE: This will override the
3009 # value in taskrunner_settings.
3010 # The supported resource type is:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003011 #
3012 # Google Cloud Storage:
3013 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003014 # storage.googleapis.com/{bucket}/{object}
3015 # bucket.storage.googleapis.com/{object}
3016 },
3017 "location": "A String", # The [regional endpoint]
3018 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3019 # contains this job.
3020 "tempFiles": [ # A set of files the system should be aware of that are used
3021 # for temporary storage. These temporary files will be
3022 # removed on job completion.
3023 # No duplicates are allowed.
3024 # No file patterns are supported.
3025 #
3026 # The supported files are:
3027 #
3028 # Google Cloud Storage:
3029 #
3030 # storage.googleapis.com/{bucket}/{object}
3031 # bucket.storage.googleapis.com/{object}
3032 "A String",
3033 ],
3034 "type": "A String", # The type of Cloud Dataflow job.
3035 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3036 # If this field is set, the service will ensure its uniqueness.
3037 # The request to create a job will fail if the service has knowledge of a
3038 # previously submitted job with the same client's ID and job name.
3039 # The caller may use this field to ensure idempotence of job
3040 # creation across retried attempts to create a job.
3041 # By default, the field is empty and, in that case, the service ignores it.
3042 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3043 # snapshot.
3044 "stepsLocation": "A String", # The GCS location where the steps are stored.
3045 "currentStateTime": "A String", # The timestamp associated with the current state.
3046 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3047 # Flexible resource scheduling jobs are started with some delay after job
3048 # creation, so start_time is unset before start and is updated when the
3049 # job is started by the Cloud Dataflow service. For other jobs, start_time
3050 # always equals to create_time and is immutable and set by the Cloud Dataflow
3051 # service.
3052 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3053 # Cloud Dataflow service.
3054 "requestedState": "A String", # The job's requested state.
3055 #
3056 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3057 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3058 # also be used to directly set a job's requested state to
3059 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3060 # job if it has not already reached a terminal state.
3061 "name": "A String", # The user-specified Cloud Dataflow job name.
3062 #
3063 # Only one Job with a given name may exist in a project at any
3064 # given time. If a caller attempts to create a Job with the same
3065 # name as an already-existing Job, the attempt returns the
3066 # existing Job.
3067 #
3068 # The name must match the regular expression
3069 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3070 "steps": [ # Exactly one of step or steps_location should be specified.
3071 #
3072 # The top-level steps that constitute the entire job.
3073 { # Defines a particular step within a Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003074 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003075 # A job consists of multiple steps, each of which performs some
3076 # specific operation as part of the overall job. Data is typically
3077 # passed from one step to another as part of the job.
3078 #
3079 # Here's an example of a sequence of steps which together implement a
3080 # Map-Reduce job:
3081 #
3082 # * Read a collection of data from some source, parsing the
3083 # collection's elements.
3084 #
3085 # * Validate the elements.
3086 #
3087 # * Apply a user-defined function to map each element to some value
3088 # and extract an element-specific key value.
3089 #
3090 # * Group elements with the same key into a single element with
3091 # that key, transforming a multiply-keyed collection into a
3092 # uniquely-keyed collection.
3093 #
3094 # * Write the elements out to some data sink.
3095 #
3096 # Note that the Cloud Dataflow service may be used to run many different
3097 # types of jobs, not just Map-Reduce.
3098 "kind": "A String", # The kind of step in the Cloud Dataflow job.
3099 "properties": { # Named properties associated with the step. Each kind of
3100 # predefined step has its own required set of properties.
3101 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3102 "a_key": "", # Properties of the object.
3103 },
3104 "name": "A String", # The name that identifies the step. This must be unique for each
3105 # step with respect to all other steps in the Cloud Dataflow job.
3106 },
3107 ],
3108 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3109 # of the job it replaced.
3110 #
3111 # When sending a `CreateJobRequest`, you can update a job by specifying it
3112 # here. The job named here is stopped, and its intermediate state is
3113 # transferred to this job.
3114 "currentState": "A String", # The current state of the job.
3115 #
3116 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3117 # specified.
3118 #
3119 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3120 # terminal state. After a job has reached a terminal state, no
3121 # further state updates may be made.
3122 #
3123 # This field may be mutated by the Cloud Dataflow service;
3124 # callers cannot mutate it.
3125 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3126 # isn't contained in the submitted job.
3127 "stages": { # A mapping from each stage to the information about that stage.
3128 "a_key": { # Contains information about how a particular
3129 # google.dataflow.v1beta3.Step will be executed.
3130 "stepName": [ # The steps associated with the execution stage.
3131 # Note that stages may have several steps, and that a given step
3132 # might be run by more than one stage.
3133 "A String",
3134 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003135 },
3136 },
3137 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003138 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003139 ],
3140 }</pre>
3141</div>
3142
3143<div class="method">
3144 <code class="details" id="list_next">list_next(previous_request, previous_response)</code>
3145 <pre>Retrieves the next page of results.
3146
3147Args:
3148 previous_request: The request for the previous page. (required)
3149 previous_response: The response from the request for the previous page. (required)
3150
3151Returns:
3152 A request object that you can call 'execute()' on to request the next
3153 page. Returns None if there are no more items in the collection.
3154 </pre>
3155</div>
3156
3157<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003158 <code class="details" id="snapshot">snapshot(projectId, jobId, body, x__xgafv=None)</code>
3159 <pre>Snapshot the state of a streaming job.
3160
3161Args:
3162 projectId: string, The project which owns the job to be snapshotted. (required)
3163 jobId: string, The job to be snapshotted. (required)
3164 body: object, The request body. (required)
3165 The object takes the form of:
3166
3167{ # Request to create a snapshot of a job.
3168 "location": "A String", # The location that contains this job.
3169 "ttl": "A String", # TTL for the snapshot.
3170 }
3171
3172 x__xgafv: string, V1 error format.
3173 Allowed values
3174 1 - v1 error format
3175 2 - v2 error format
3176
3177Returns:
3178 An object of the form:
3179
3180 { # Represents a snapshot of a job.
3181 "sourceJobId": "A String", # The job this snapshot was created from.
3182 "projectId": "A String", # The project this snapshot belongs to.
3183 "creationTime": "A String", # The time this snapshot was created.
3184 "state": "A String", # State of the snapshot.
3185 "ttl": "A String", # The time after which this snapshot will be automatically deleted.
3186 "id": "A String", # The unique ID of this snapshot.
3187 }</pre>
3188</div>
3189
3190<div class="method">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003191 <code class="details" id="update">update(projectId, jobId, body, location=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003192 <pre>Updates the state of an existing Cloud Dataflow job.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003193
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003194To update the state of an existing job, we recommend using
3195`projects.locations.jobs.update` with a [regional endpoint]
3196(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
3197`projects.jobs.update` is not recommended, as you can only update the state
3198of jobs that are running in `us-central1`.
3199
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003200Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003201 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
3202 jobId: string, The job ID. (required)
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003203 body: object, The request body. (required)
3204 The object takes the form of:
3205
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003206{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003207 "labels": { # User-defined labels for this job.
3208 #
3209 # The labels map can contain no more than 64 entries. Entries of the labels
3210 # map are UTF8 strings that comply with the following restrictions:
3211 #
3212 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3213 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
3214 # * Both keys and values are additionally constrained to be <= 128 bytes in
3215 # size.
3216 "a_key": "A String",
3217 },
3218 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3219 # by the metadata values provided here. Populated for ListJobs and all GetJob
3220 # views SUMMARY and higher.
3221 # ListJob response and Job SUMMARY view.
3222 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
3223 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
3224 "version": "A String", # The version of the SDK used to run the job.
3225 "sdkSupportStatus": "A String", # The support status for this SDK version.
3226 },
3227 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
3228 { # Metadata for a PubSub connector used by the job.
3229 "topic": "A String", # Topic accessed in the connection.
3230 "subscription": "A String", # Subscription used in the connection.
3231 },
3232 ],
3233 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
3234 { # Metadata for a Datastore connector used by the job.
3235 "projectId": "A String", # ProjectId accessed in the connection.
3236 "namespace": "A String", # Namespace used in the connection.
3237 },
3238 ],
3239 "fileDetails": [ # Identification of a File source used in the Dataflow job.
3240 { # Metadata for a File connector used by the job.
3241 "filePattern": "A String", # File Pattern used to access files by the connector.
3242 },
3243 ],
3244 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
3245 { # Metadata for a Spanner connector used by the job.
3246 "instanceId": "A String", # InstanceId accessed in the connection.
3247 "projectId": "A String", # ProjectId accessed in the connection.
3248 "databaseId": "A String", # DatabaseId accessed in the connection.
3249 },
3250 ],
3251 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
3252 { # Metadata for a BigTable connector used by the job.
3253 "instanceId": "A String", # InstanceId accessed in the connection.
3254 "projectId": "A String", # ProjectId accessed in the connection.
3255 "tableId": "A String", # TableId accessed in the connection.
3256 },
3257 ],
3258 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
3259 { # Metadata for a BigQuery connector used by the job.
3260 "projectId": "A String", # Project accessed in the connection.
3261 "dataset": "A String", # Dataset accessed in the connection.
3262 "table": "A String", # Table accessed in the connection.
3263 "query": "A String", # Query used to access data in the connection.
3264 },
3265 ],
3266 },
3267 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3268 # A description of the user pipeline and stages through which it is executed.
3269 # Created by Cloud Dataflow service. Only retrieved with
3270 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3271 # form. This data is provided by the Dataflow service for ease of visualizing
3272 # the pipeline and interpreting Dataflow provided metrics.
3273 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
3274 { # Description of the type, names/ids, and input/outputs for a transform.
3275 "kind": "A String", # Type of transform.
3276 "name": "A String", # User provided name for this transform instance.
3277 "inputCollectionName": [ # User names for all collection inputs to this transform.
3278 "A String",
3279 ],
3280 "displayData": [ # Transform-specific display data.
3281 { # Data provided with a pipeline or transform to provide descriptive info.
3282 "shortStrValue": "A String", # A possible additional shorter value to display.
3283 # For example a java_class_name_value of com.mypackage.MyDoFn
3284 # will be stored with MyDoFn as the short_str_value and
3285 # com.mypackage.MyDoFn as the java_class_name value.
3286 # short_str_value can be displayed and java_class_name_value
3287 # will be displayed as a tooltip.
3288 "durationValue": "A String", # Contains value if the data is of duration type.
3289 "url": "A String", # An optional full URL.
3290 "floatValue": 3.14, # Contains value if the data is of float type.
3291 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3292 # language namespace (i.e. python module) which defines the display data.
3293 # This allows a dax monitoring system to specially handle the data
3294 # and perform custom rendering.
3295 "javaClassValue": "A String", # Contains value if the data is of java class type.
3296 "label": "A String", # An optional label to display in a dax UI for the element.
3297 "boolValue": True or False, # Contains value if the data is of a boolean type.
3298 "strValue": "A String", # Contains value if the data is of string type.
3299 "key": "A String", # The key identifying the display data.
3300 # This is intended to be used as a label for the display data
3301 # when viewed in a dax monitoring system.
3302 "int64Value": "A String", # Contains value if the data is of int64 type.
3303 "timestampValue": "A String", # Contains value if the data is of timestamp type.
3304 },
3305 ],
3306 "outputCollectionName": [ # User names for all collection outputs to this transform.
3307 "A String",
3308 ],
3309 "id": "A String", # SDK generated id of this transform instance.
3310 },
3311 ],
3312 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
3313 { # Description of the composing transforms, names/ids, and input/outputs of a
3314 # stage of execution. Some composing transforms and sources may have been
3315 # generated by the Dataflow service during execution planning.
3316 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
3317 { # Description of an interstitial value between transforms in an execution
3318 # stage.
3319 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3320 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3321 # source is most closely associated.
3322 "name": "A String", # Dataflow service generated name for this source.
3323 },
3324 ],
3325 "kind": "A String", # Type of tranform this stage is executing.
3326 "name": "A String", # Dataflow service generated name for this stage.
3327 "outputSource": [ # Output sources for this stage.
3328 { # Description of an input or output of an execution stage.
3329 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3330 "sizeBytes": "A String", # Size of the source, if measurable.
3331 "name": "A String", # Dataflow service generated name for this source.
3332 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3333 # source is most closely associated.
3334 },
3335 ],
3336 "inputSource": [ # Input sources for this stage.
3337 { # Description of an input or output of an execution stage.
3338 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3339 "sizeBytes": "A String", # Size of the source, if measurable.
3340 "name": "A String", # Dataflow service generated name for this source.
3341 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3342 # source is most closely associated.
3343 },
3344 ],
3345 "componentTransform": [ # Transforms that comprise this execution stage.
3346 { # Description of a transform executed as part of an execution stage.
3347 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3348 "originalTransform": "A String", # User name for the original user transform with which this transform is
3349 # most closely associated.
3350 "name": "A String", # Dataflow service generated name for this source.
3351 },
3352 ],
3353 "id": "A String", # Dataflow service generated id for this stage.
3354 },
3355 ],
3356 "displayData": [ # Pipeline level display data.
3357 { # Data provided with a pipeline or transform to provide descriptive info.
3358 "shortStrValue": "A String", # A possible additional shorter value to display.
3359 # For example a java_class_name_value of com.mypackage.MyDoFn
3360 # will be stored with MyDoFn as the short_str_value and
3361 # com.mypackage.MyDoFn as the java_class_name value.
3362 # short_str_value can be displayed and java_class_name_value
3363 # will be displayed as a tooltip.
3364 "durationValue": "A String", # Contains value if the data is of duration type.
3365 "url": "A String", # An optional full URL.
3366 "floatValue": 3.14, # Contains value if the data is of float type.
3367 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3368 # language namespace (i.e. python module) which defines the display data.
3369 # This allows a dax monitoring system to specially handle the data
3370 # and perform custom rendering.
3371 "javaClassValue": "A String", # Contains value if the data is of java class type.
3372 "label": "A String", # An optional label to display in a dax UI for the element.
3373 "boolValue": True or False, # Contains value if the data is of a boolean type.
3374 "strValue": "A String", # Contains value if the data is of string type.
3375 "key": "A String", # The key identifying the display data.
3376 # This is intended to be used as a label for the display data
3377 # when viewed in a dax monitoring system.
3378 "int64Value": "A String", # Contains value if the data is of int64 type.
3379 "timestampValue": "A String", # Contains value if the data is of timestamp type.
3380 },
3381 ],
3382 },
3383 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
3384 # callers cannot mutate it.
3385 { # A message describing the state of a particular execution stage.
3386 "executionStageName": "A String", # The name of the execution stage.
3387 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
3388 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
3389 },
3390 ],
3391 "id": "A String", # The unique ID of this job.
3392 #
3393 # This field is set by the Cloud Dataflow service when the Job is
3394 # created, and is immutable for the life of the job.
3395 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
3396 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3397 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
3398 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
3399 # corresponding name prefixes of the new job.
3400 "a_key": "A String",
3401 },
3402 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
3403 "version": { # A structure describing which components and their versions of the service
3404 # are required in order to run the job.
3405 "a_key": "", # Properties of the object.
3406 },
3407 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
3408 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
3409 # at rest, AKA a Customer Managed Encryption Key (CMEK).
3410 #
3411 # Format:
3412 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
3413 "internalExperiments": { # Experimental settings.
3414 "a_key": "", # Properties of the object. Contains field @type with type URL.
3415 },
3416 "dataset": "A String", # The dataset for the current project where various workflow
3417 # related tables are stored.
3418 #
3419 # The supported resource type is:
3420 #
3421 # Google BigQuery:
3422 # bigquery.googleapis.com/{dataset}
3423 "experiments": [ # The list of experiments to enable.
3424 "A String",
3425 ],
3426 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
3427 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
3428 # options are passed through the service and are used to recreate the
3429 # SDK pipeline options on the worker in a language agnostic and platform
3430 # independent way.
3431 "a_key": "", # Properties of the object.
3432 },
3433 "userAgent": { # A description of the process that generated the request.
3434 "a_key": "", # Properties of the object.
3435 },
3436 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
3437 # unspecified, the service will attempt to choose a reasonable
3438 # default. This should be in the form of the API service name,
3439 # e.g. "compute.googleapis.com".
3440 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
3441 # specified in order for the job to have workers.
3442 { # Describes one particular pool of Cloud Dataflow workers to be
3443 # instantiated by the Cloud Dataflow service in order to perform the
3444 # computations required by a job. Note that a workflow job may use
3445 # multiple pools, in order to match the various computational
3446 # requirements of the various stages of the job.
3447 "diskSourceImage": "A String", # Fully qualified source image for disks.
3448 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
3449 # using the standard Dataflow task runner. Users should ignore
3450 # this field.
3451 "workflowFileName": "A String", # The file to store the workflow in.
3452 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
3453 # will not be uploaded.
3454 #
3455 # The supported resource type is:
3456 #
3457 # Google Cloud Storage:
3458 # storage.googleapis.com/{bucket}/{object}
3459 # bucket.storage.googleapis.com/{object}
3460 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
3461 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
3462 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
3463 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
3464 # "shuffle/v1beta1".
3465 "workerId": "A String", # The ID of the worker running this pipeline.
3466 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
3467 #
3468 # When workers access Google Cloud APIs, they logically do so via
3469 # relative URLs. If this field is specified, it supplies the base
3470 # URL to use for resolving these relative URLs. The normative
3471 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3472 # Locators".
3473 #
3474 # If not specified, the default value is "http://www.googleapis.com/"
3475 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
3476 # "dataflow/v1b3/projects".
3477 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3478 # storage.
3479 #
3480 # The supported resource type is:
3481 #
3482 # Google Cloud Storage:
3483 #
3484 # storage.googleapis.com/{bucket}/{object}
3485 # bucket.storage.googleapis.com/{object}
3486 },
3487 "vmId": "A String", # The ID string of the VM.
3488 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
3489 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
3490 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
3491 # access the Cloud Dataflow API.
3492 "A String",
3493 ],
3494 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
3495 # taskrunner; e.g. "root".
3496 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3497 #
3498 # When workers access Google Cloud APIs, they logically do so via
3499 # relative URLs. If this field is specified, it supplies the base
3500 # URL to use for resolving these relative URLs. The normative
3501 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3502 # Locators".
3503 #
3504 # If not specified, the default value is "http://www.googleapis.com/"
3505 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
3506 # taskrunner; e.g. "wheel".
3507 "languageHint": "A String", # The suggested backend language.
3508 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3509 # console.
3510 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
3511 "logDir": "A String", # The directory on the VM to store logs.
3512 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
3513 "harnessCommand": "A String", # The command to launch the worker harness.
3514 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
3515 # temporary storage.
3516 #
3517 # The supported resource type is:
3518 #
3519 # Google Cloud Storage:
3520 # storage.googleapis.com/{bucket}/{object}
3521 # bucket.storage.googleapis.com/{object}
3522 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
3523 },
3524 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
3525 # are supported.
3526 "packages": [ # Packages to be installed on workers.
3527 { # The packages that must be installed in order for a worker to run the
3528 # steps of the Cloud Dataflow job that will be assigned to its worker
3529 # pool.
3530 #
3531 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3532 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3533 # might use this to install jars containing the user's code and all of the
3534 # various dependencies (libraries, data files, etc.) required in order
3535 # for that code to run.
3536 "location": "A String", # The resource to read the package from. The supported resource type is:
3537 #
3538 # Google Cloud Storage:
3539 #
3540 # storage.googleapis.com/{bucket}
3541 # bucket.storage.googleapis.com/
3542 "name": "A String", # The name of the package.
3543 },
3544 ],
3545 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
3546 # service will attempt to choose a reasonable default.
3547 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
3548 # the service will use the network "default".
3549 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
3550 # will attempt to choose a reasonable default.
3551 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3552 # attempt to choose a reasonable default.
3553 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
3554 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3555 # `TEARDOWN_NEVER`.
3556 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3557 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3558 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3559 # down.
3560 #
3561 # If the workers are not torn down by the service, they will
3562 # continue to run and use Google Compute Engine VM resources in the
3563 # user's project until they are explicitly terminated by the user.
3564 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3565 # policy except for small, manually supervised test jobs.
3566 #
3567 # If unknown or unspecified, the service will attempt to choose a reasonable
3568 # default.
3569 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
3570 # Compute Engine API.
3571 "ipConfiguration": "A String", # Configuration for VM IPs.
3572 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
3573 # service will choose a number of threads (according to the number of cores
3574 # on the selected machine type for batch, or 1 by convention for streaming).
3575 "poolArgs": { # Extra arguments for this worker pool.
3576 "a_key": "", # Properties of the object. Contains field @type with type URL.
3577 },
3578 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
3579 # execute the job. If zero or unspecified, the service will
3580 # attempt to choose a reasonable default.
3581 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
3582 # harness, residing in Google Container Registry.
3583 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3584 # the form "regions/REGION/subnetworks/SUBNETWORK".
3585 "dataDisks": [ # Data disks that are used by a VM in this workflow.
3586 { # Describes the data disk used by a workflow job.
3587 "mountPoint": "A String", # Directory in a VM where disk is mounted.
3588 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
3589 # attempt to choose a reasonable default.
3590 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
3591 # must be a disk type appropriate to the project and zone in which
3592 # the workers will run. If unknown or unspecified, the service
3593 # will attempt to choose a reasonable default.
3594 #
3595 # For example, the standard persistent disk type is a resource name
3596 # typically ending in "pd-standard". If SSD persistent disks are
3597 # available, the resource name typically ends with "pd-ssd". The
3598 # actual valid values are defined the Google Compute Engine API,
3599 # not by the Cloud Dataflow API; consult the Google Compute Engine
3600 # documentation for more information about determining the set of
3601 # available disk types for a particular project and zone.
3602 #
3603 # Google Compute Engine Disk types are local to a particular
3604 # project in a particular zone, and so the resource name will
3605 # typically look something like this:
3606 #
3607 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
3608 },
3609 ],
3610 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3611 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
3612 "algorithm": "A String", # The algorithm to use for autoscaling.
3613 },
3614 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3615 # select a default set of packages which are useful to worker
3616 # harnesses written in a particular language.
3617 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
3618 # attempt to choose a reasonable default.
3619 "metadata": { # Metadata to set on the Google Compute Engine VMs.
3620 "a_key": "A String",
3621 },
3622 },
3623 ],
3624 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3625 # storage. The system will append the suffix "/temp-{JOBNAME} to
3626 # this resource prefix, where {JOBNAME} is the value of the
3627 # job_name field. The resulting bucket and object prefix is used
3628 # as the prefix of the resources used to store temporary data
3629 # needed during the job execution. NOTE: This will override the
3630 # value in taskrunner_settings.
3631 # The supported resource type is:
3632 #
3633 # Google Cloud Storage:
3634 #
3635 # storage.googleapis.com/{bucket}/{object}
3636 # bucket.storage.googleapis.com/{object}
3637 },
3638 "location": "A String", # The [regional endpoint]
3639 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3640 # contains this job.
3641 "tempFiles": [ # A set of files the system should be aware of that are used
3642 # for temporary storage. These temporary files will be
3643 # removed on job completion.
3644 # No duplicates are allowed.
3645 # No file patterns are supported.
3646 #
3647 # The supported files are:
3648 #
3649 # Google Cloud Storage:
3650 #
3651 # storage.googleapis.com/{bucket}/{object}
3652 # bucket.storage.googleapis.com/{object}
3653 "A String",
3654 ],
3655 "type": "A String", # The type of Cloud Dataflow job.
3656 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3657 # If this field is set, the service will ensure its uniqueness.
3658 # The request to create a job will fail if the service has knowledge of a
3659 # previously submitted job with the same client's ID and job name.
3660 # The caller may use this field to ensure idempotence of job
3661 # creation across retried attempts to create a job.
3662 # By default, the field is empty and, in that case, the service ignores it.
3663 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3664 # snapshot.
3665 "stepsLocation": "A String", # The GCS location where the steps are stored.
3666 "currentStateTime": "A String", # The timestamp associated with the current state.
3667 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3668 # Flexible resource scheduling jobs are started with some delay after job
3669 # creation, so start_time is unset before start and is updated when the
3670 # job is started by the Cloud Dataflow service. For other jobs, start_time
3671 # always equals to create_time and is immutable and set by the Cloud Dataflow
3672 # service.
3673 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3674 # Cloud Dataflow service.
3675 "requestedState": "A String", # The job's requested state.
3676 #
3677 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3678 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3679 # also be used to directly set a job's requested state to
3680 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3681 # job if it has not already reached a terminal state.
3682 "name": "A String", # The user-specified Cloud Dataflow job name.
3683 #
3684 # Only one Job with a given name may exist in a project at any
3685 # given time. If a caller attempts to create a Job with the same
3686 # name as an already-existing Job, the attempt returns the
3687 # existing Job.
3688 #
3689 # The name must match the regular expression
3690 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3691 "steps": [ # Exactly one of step or steps_location should be specified.
3692 #
3693 # The top-level steps that constitute the entire job.
3694 { # Defines a particular step within a Cloud Dataflow job.
3695 #
3696 # A job consists of multiple steps, each of which performs some
3697 # specific operation as part of the overall job. Data is typically
3698 # passed from one step to another as part of the job.
3699 #
3700 # Here's an example of a sequence of steps which together implement a
3701 # Map-Reduce job:
3702 #
3703 # * Read a collection of data from some source, parsing the
3704 # collection's elements.
3705 #
3706 # * Validate the elements.
3707 #
3708 # * Apply a user-defined function to map each element to some value
3709 # and extract an element-specific key value.
3710 #
3711 # * Group elements with the same key into a single element with
3712 # that key, transforming a multiply-keyed collection into a
3713 # uniquely-keyed collection.
3714 #
3715 # * Write the elements out to some data sink.
3716 #
3717 # Note that the Cloud Dataflow service may be used to run many different
3718 # types of jobs, not just Map-Reduce.
3719 "kind": "A String", # The kind of step in the Cloud Dataflow job.
3720 "properties": { # Named properties associated with the step. Each kind of
3721 # predefined step has its own required set of properties.
3722 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3723 "a_key": "", # Properties of the object.
3724 },
3725 "name": "A String", # The name that identifies the step. This must be unique for each
3726 # step with respect to all other steps in the Cloud Dataflow job.
3727 },
3728 ],
3729 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3730 # of the job it replaced.
3731 #
3732 # When sending a `CreateJobRequest`, you can update a job by specifying it
3733 # here. The job named here is stopped, and its intermediate state is
3734 # transferred to this job.
3735 "currentState": "A String", # The current state of the job.
3736 #
3737 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3738 # specified.
3739 #
3740 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3741 # terminal state. After a job has reached a terminal state, no
3742 # further state updates may be made.
3743 #
3744 # This field may be mutated by the Cloud Dataflow service;
3745 # callers cannot mutate it.
3746 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3747 # isn't contained in the submitted job.
3748 "stages": { # A mapping from each stage to the information about that stage.
3749 "a_key": { # Contains information about how a particular
3750 # google.dataflow.v1beta3.Step will be executed.
3751 "stepName": [ # The steps associated with the execution stage.
3752 # Note that stages may have several steps, and that a given step
3753 # might be run by more than one stage.
3754 "A String",
3755 ],
3756 },
3757 },
3758 },
3759}
3760
3761 location: string, The [regional endpoint]
3762(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3763contains this job.
3764 x__xgafv: string, V1 error format.
3765 Allowed values
3766 1 - v1 error format
3767 2 - v2 error format
3768
3769Returns:
3770 An object of the form:
3771
3772 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003773 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003774 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003775 # The labels map can contain no more than 64 entries. Entries of the labels
3776 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003777 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003778 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3779 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
3780 # * Both keys and values are additionally constrained to be <= 128 bytes in
3781 # size.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07003782 "a_key": "A String",
3783 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003784 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3785 # by the metadata values provided here. Populated for ListJobs and all GetJob
3786 # views SUMMARY and higher.
3787 # ListJob response and Job SUMMARY view.
3788 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
3789 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
3790 "version": "A String", # The version of the SDK used to run the job.
3791 "sdkSupportStatus": "A String", # The support status for this SDK version.
3792 },
3793 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
3794 { # Metadata for a PubSub connector used by the job.
3795 "topic": "A String", # Topic accessed in the connection.
3796 "subscription": "A String", # Subscription used in the connection.
3797 },
3798 ],
3799 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
3800 { # Metadata for a Datastore connector used by the job.
3801 "projectId": "A String", # ProjectId accessed in the connection.
3802 "namespace": "A String", # Namespace used in the connection.
3803 },
3804 ],
3805 "fileDetails": [ # Identification of a File source used in the Dataflow job.
3806 { # Metadata for a File connector used by the job.
3807 "filePattern": "A String", # File Pattern used to access files by the connector.
3808 },
3809 ],
3810 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
3811 { # Metadata for a Spanner connector used by the job.
3812 "instanceId": "A String", # InstanceId accessed in the connection.
3813 "projectId": "A String", # ProjectId accessed in the connection.
3814 "databaseId": "A String", # DatabaseId accessed in the connection.
3815 },
3816 ],
3817 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
3818 { # Metadata for a BigTable connector used by the job.
3819 "instanceId": "A String", # InstanceId accessed in the connection.
3820 "projectId": "A String", # ProjectId accessed in the connection.
3821 "tableId": "A String", # TableId accessed in the connection.
3822 },
3823 ],
3824 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
3825 { # Metadata for a BigQuery connector used by the job.
3826 "projectId": "A String", # Project accessed in the connection.
3827 "dataset": "A String", # Dataset accessed in the connection.
3828 "table": "A String", # Table accessed in the connection.
3829 "query": "A String", # Query used to access data in the connection.
3830 },
3831 ],
3832 },
3833 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3834 # A description of the user pipeline and stages through which it is executed.
3835 # Created by Cloud Dataflow service. Only retrieved with
3836 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3837 # form. This data is provided by the Dataflow service for ease of visualizing
3838 # the pipeline and interpreting Dataflow provided metrics.
3839 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
3840 { # Description of the type, names/ids, and input/outputs for a transform.
3841 "kind": "A String", # Type of transform.
3842 "name": "A String", # User provided name for this transform instance.
3843 "inputCollectionName": [ # User names for all collection inputs to this transform.
3844 "A String",
3845 ],
3846 "displayData": [ # Transform-specific display data.
3847 { # Data provided with a pipeline or transform to provide descriptive info.
3848 "shortStrValue": "A String", # A possible additional shorter value to display.
3849 # For example a java_class_name_value of com.mypackage.MyDoFn
3850 # will be stored with MyDoFn as the short_str_value and
3851 # com.mypackage.MyDoFn as the java_class_name value.
3852 # short_str_value can be displayed and java_class_name_value
3853 # will be displayed as a tooltip.
3854 "durationValue": "A String", # Contains value if the data is of duration type.
3855 "url": "A String", # An optional full URL.
3856 "floatValue": 3.14, # Contains value if the data is of float type.
3857 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3858 # language namespace (i.e. python module) which defines the display data.
3859 # This allows a dax monitoring system to specially handle the data
3860 # and perform custom rendering.
3861 "javaClassValue": "A String", # Contains value if the data is of java class type.
3862 "label": "A String", # An optional label to display in a dax UI for the element.
3863 "boolValue": True or False, # Contains value if the data is of a boolean type.
3864 "strValue": "A String", # Contains value if the data is of string type.
3865 "key": "A String", # The key identifying the display data.
3866 # This is intended to be used as a label for the display data
3867 # when viewed in a dax monitoring system.
3868 "int64Value": "A String", # Contains value if the data is of int64 type.
3869 "timestampValue": "A String", # Contains value if the data is of timestamp type.
3870 },
3871 ],
3872 "outputCollectionName": [ # User names for all collection outputs to this transform.
3873 "A String",
3874 ],
3875 "id": "A String", # SDK generated id of this transform instance.
3876 },
3877 ],
3878 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
3879 { # Description of the composing transforms, names/ids, and input/outputs of a
3880 # stage of execution. Some composing transforms and sources may have been
3881 # generated by the Dataflow service during execution planning.
3882 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
3883 { # Description of an interstitial value between transforms in an execution
3884 # stage.
3885 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3886 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3887 # source is most closely associated.
3888 "name": "A String", # Dataflow service generated name for this source.
3889 },
3890 ],
3891 "kind": "A String", # Type of tranform this stage is executing.
3892 "name": "A String", # Dataflow service generated name for this stage.
3893 "outputSource": [ # Output sources for this stage.
3894 { # Description of an input or output of an execution stage.
3895 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3896 "sizeBytes": "A String", # Size of the source, if measurable.
3897 "name": "A String", # Dataflow service generated name for this source.
3898 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3899 # source is most closely associated.
3900 },
3901 ],
3902 "inputSource": [ # Input sources for this stage.
3903 { # Description of an input or output of an execution stage.
3904 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3905 "sizeBytes": "A String", # Size of the source, if measurable.
3906 "name": "A String", # Dataflow service generated name for this source.
3907 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3908 # source is most closely associated.
3909 },
3910 ],
3911 "componentTransform": [ # Transforms that comprise this execution stage.
3912 { # Description of a transform executed as part of an execution stage.
3913 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3914 "originalTransform": "A String", # User name for the original user transform with which this transform is
3915 # most closely associated.
3916 "name": "A String", # Dataflow service generated name for this source.
3917 },
3918 ],
3919 "id": "A String", # Dataflow service generated id for this stage.
3920 },
3921 ],
3922 "displayData": [ # Pipeline level display data.
3923 { # Data provided with a pipeline or transform to provide descriptive info.
3924 "shortStrValue": "A String", # A possible additional shorter value to display.
3925 # For example a java_class_name_value of com.mypackage.MyDoFn
3926 # will be stored with MyDoFn as the short_str_value and
3927 # com.mypackage.MyDoFn as the java_class_name value.
3928 # short_str_value can be displayed and java_class_name_value
3929 # will be displayed as a tooltip.
3930 "durationValue": "A String", # Contains value if the data is of duration type.
3931 "url": "A String", # An optional full URL.
3932 "floatValue": 3.14, # Contains value if the data is of float type.
3933 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3934 # language namespace (i.e. python module) which defines the display data.
3935 # This allows a dax monitoring system to specially handle the data
3936 # and perform custom rendering.
3937 "javaClassValue": "A String", # Contains value if the data is of java class type.
3938 "label": "A String", # An optional label to display in a dax UI for the element.
3939 "boolValue": True or False, # Contains value if the data is of a boolean type.
3940 "strValue": "A String", # Contains value if the data is of string type.
3941 "key": "A String", # The key identifying the display data.
3942 # This is intended to be used as a label for the display data
3943 # when viewed in a dax monitoring system.
3944 "int64Value": "A String", # Contains value if the data is of int64 type.
3945 "timestampValue": "A String", # Contains value if the data is of timestamp type.
3946 },
3947 ],
3948 },
3949 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
3950 # callers cannot mutate it.
3951 { # A message describing the state of a particular execution stage.
3952 "executionStageName": "A String", # The name of the execution stage.
3953 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
3954 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
3955 },
3956 ],
3957 "id": "A String", # The unique ID of this job.
3958 #
3959 # This field is set by the Cloud Dataflow service when the Job is
3960 # created, and is immutable for the life of the job.
3961 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
3962 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3963 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003964 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
3965 # corresponding name prefixes of the new job.
Takashi Matsuo06694102015-09-11 13:55:40 -07003966 "a_key": "A String",
Nathaniel Manista4f877e52015-06-15 16:44:50 +00003967 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003968 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
3969 "version": { # A structure describing which components and their versions of the service
3970 # are required in order to run the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07003971 "a_key": "", # Properties of the object.
3972 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003973 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
3974 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
3975 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003976 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003977 # Format:
3978 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Takashi Matsuo06694102015-09-11 13:55:40 -07003979 "internalExperiments": { # Experimental settings.
Jon Wayne Parrott7d5badb2016-08-16 12:44:29 -07003980 "a_key": "", # Properties of the object. Contains field @type with type URL.
Takashi Matsuo06694102015-09-11 13:55:40 -07003981 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003982 "dataset": "A String", # The dataset for the current project where various workflow
3983 # related tables are stored.
3984 #
3985 # The supported resource type is:
3986 #
3987 # Google BigQuery:
3988 # bigquery.googleapis.com/{dataset}
Takashi Matsuo06694102015-09-11 13:55:40 -07003989 "experiments": [ # The list of experiments to enable.
3990 "A String",
3991 ],
Sai Cheemalapatiea3a5e12016-10-12 14:05:53 -07003992 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003993 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
3994 # options are passed through the service and are used to recreate the
3995 # SDK pipeline options on the worker in a language agnostic and platform
3996 # independent way.
Takashi Matsuo06694102015-09-11 13:55:40 -07003997 "a_key": "", # Properties of the object.
3998 },
3999 "userAgent": { # A description of the process that generated the request.
4000 "a_key": "", # Properties of the object.
4001 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004002 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
4003 # unspecified, the service will attempt to choose a reasonable
4004 # default. This should be in the form of the API service name,
4005 # e.g. "compute.googleapis.com".
4006 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
4007 # specified in order for the job to have workers.
4008 { # Describes one particular pool of Cloud Dataflow workers to be
4009 # instantiated by the Cloud Dataflow service in order to perform the
4010 # computations required by a job. Note that a workflow job may use
4011 # multiple pools, in order to match the various computational
4012 # requirements of the various stages of the job.
Takashi Matsuo06694102015-09-11 13:55:40 -07004013 "diskSourceImage": "A String", # Fully qualified source image for disks.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004014 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
4015 # using the standard Dataflow task runner. Users should ignore
4016 # this field.
4017 "workflowFileName": "A String", # The file to store the workflow in.
4018 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
4019 # will not be uploaded.
4020 #
4021 # The supported resource type is:
4022 #
4023 # Google Cloud Storage:
4024 # storage.googleapis.com/{bucket}/{object}
4025 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07004026 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004027 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
4028 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
4029 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
4030 # "shuffle/v1beta1".
4031 "workerId": "A String", # The ID of the worker running this pipeline.
4032 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
4033 #
4034 # When workers access Google Cloud APIs, they logically do so via
4035 # relative URLs. If this field is specified, it supplies the base
4036 # URL to use for resolving these relative URLs. The normative
4037 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
4038 # Locators".
4039 #
4040 # If not specified, the default value is "http://www.googleapis.com/"
4041 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
4042 # "dataflow/v1b3/projects".
4043 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
4044 # storage.
4045 #
4046 # The supported resource type is:
4047 #
4048 # Google Cloud Storage:
4049 #
4050 # storage.googleapis.com/{bucket}/{object}
4051 # bucket.storage.googleapis.com/{object}
4052 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004053 "vmId": "A String", # The ID string of the VM.
4054 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
4055 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
4056 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
4057 # access the Cloud Dataflow API.
4058 "A String",
4059 ],
4060 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
4061 # taskrunner; e.g. "root".
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004062 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
4063 #
4064 # When workers access Google Cloud APIs, they logically do so via
4065 # relative URLs. If this field is specified, it supplies the base
4066 # URL to use for resolving these relative URLs. The normative
4067 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
4068 # Locators".
4069 #
4070 # If not specified, the default value is "http://www.googleapis.com/"
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004071 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
4072 # taskrunner; e.g. "wheel".
4073 "languageHint": "A String", # The suggested backend language.
4074 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
4075 # console.
4076 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
4077 "logDir": "A String", # The directory on the VM to store logs.
4078 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07004079 "harnessCommand": "A String", # The command to launch the worker harness.
4080 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
4081 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004082 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07004083 # The supported resource type is:
4084 #
4085 # Google Cloud Storage:
4086 # storage.googleapis.com/{bucket}/{object}
4087 # bucket.storage.googleapis.com/{object}
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004088 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
Takashi Matsuo06694102015-09-11 13:55:40 -07004089 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004090 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
4091 # are supported.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004092 "packages": [ # Packages to be installed on workers.
4093 { # The packages that must be installed in order for a worker to run the
4094 # steps of the Cloud Dataflow job that will be assigned to its worker
4095 # pool.
4096 #
4097 # This is the mechanism by which the Cloud Dataflow SDK causes code to
4098 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
4099 # might use this to install jars containing the user's code and all of the
4100 # various dependencies (libraries, data files, etc.) required in order
4101 # for that code to run.
4102 "location": "A String", # The resource to read the package from. The supported resource type is:
4103 #
4104 # Google Cloud Storage:
4105 #
4106 # storage.googleapis.com/{bucket}
4107 # bucket.storage.googleapis.com/
4108 "name": "A String", # The name of the package.
4109 },
4110 ],
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004111 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
4112 # service will attempt to choose a reasonable default.
4113 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
4114 # the service will use the network "default".
4115 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
4116 # will attempt to choose a reasonable default.
4117 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
4118 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004119 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
4120 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
4121 # `TEARDOWN_NEVER`.
4122 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
4123 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
4124 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
4125 # down.
4126 #
4127 # If the workers are not torn down by the service, they will
4128 # continue to run and use Google Compute Engine VM resources in the
4129 # user's project until they are explicitly terminated by the user.
4130 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
4131 # policy except for small, manually supervised test jobs.
4132 #
4133 # If unknown or unspecified, the service will attempt to choose a reasonable
4134 # default.
4135 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
4136 # Compute Engine API.
4137 "ipConfiguration": "A String", # Configuration for VM IPs.
4138 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
4139 # service will choose a number of threads (according to the number of cores
4140 # on the selected machine type for batch, or 1 by convention for streaming).
4141 "poolArgs": { # Extra arguments for this worker pool.
4142 "a_key": "", # Properties of the object. Contains field @type with type URL.
4143 },
4144 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
4145 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004146 # attempt to choose a reasonable default.
4147 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
4148 # harness, residing in Google Container Registry.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004149 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
4150 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004151 "dataDisks": [ # Data disks that are used by a VM in this workflow.
4152 { # Describes the data disk used by a workflow job.
4153 "mountPoint": "A String", # Directory in a VM where disk is mounted.
4154 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
4155 # attempt to choose a reasonable default.
4156 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
4157 # must be a disk type appropriate to the project and zone in which
4158 # the workers will run. If unknown or unspecified, the service
4159 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004160 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004161 # For example, the standard persistent disk type is a resource name
4162 # typically ending in "pd-standard". If SSD persistent disks are
4163 # available, the resource name typically ends with "pd-ssd". The
4164 # actual valid values are defined the Google Compute Engine API,
4165 # not by the Cloud Dataflow API; consult the Google Compute Engine
4166 # documentation for more information about determining the set of
4167 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004168 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004169 # Google Compute Engine Disk types are local to a particular
4170 # project in a particular zone, and so the resource name will
4171 # typically look something like this:
4172 #
4173 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04004174 },
4175 ],
4176 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
4177 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
4178 "algorithm": "A String", # The algorithm to use for autoscaling.
4179 },
4180 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
4181 # select a default set of packages which are useful to worker
4182 # harnesses written in a particular language.
4183 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
4184 # attempt to choose a reasonable default.
4185 "metadata": { # Metadata to set on the Google Compute Engine VMs.
4186 "a_key": "A String",
4187 },
Takashi Matsuo06694102015-09-11 13:55:40 -07004188 },
4189 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004190 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
4191 # storage. The system will append the suffix "/temp-{JOBNAME} to
4192 # this resource prefix, where {JOBNAME} is the value of the
4193 # job_name field. The resulting bucket and object prefix is used
4194 # as the prefix of the resources used to store temporary data
4195 # needed during the job execution. NOTE: This will override the
4196 # value in taskrunner_settings.
4197 # The supported resource type is:
4198 #
4199 # Google Cloud Storage:
4200 #
4201 # storage.googleapis.com/{bucket}/{object}
4202 # bucket.storage.googleapis.com/{object}
Takashi Matsuo06694102015-09-11 13:55:40 -07004203 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004204 "location": "A String", # The [regional endpoint]
4205 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
4206 # contains this job.
4207 "tempFiles": [ # A set of files the system should be aware of that are used
4208 # for temporary storage. These temporary files will be
4209 # removed on job completion.
4210 # No duplicates are allowed.
4211 # No file patterns are supported.
4212 #
4213 # The supported files are:
4214 #
4215 # Google Cloud Storage:
4216 #
4217 # storage.googleapis.com/{bucket}/{object}
4218 # bucket.storage.googleapis.com/{object}
4219 "A String",
4220 ],
4221 "type": "A String", # The type of Cloud Dataflow job.
4222 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
4223 # If this field is set, the service will ensure its uniqueness.
4224 # The request to create a job will fail if the service has knowledge of a
4225 # previously submitted job with the same client's ID and job name.
4226 # The caller may use this field to ensure idempotence of job
4227 # creation across retried attempts to create a job.
4228 # By default, the field is empty and, in that case, the service ignores it.
4229 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
4230 # snapshot.
4231 "stepsLocation": "A String", # The GCS location where the steps are stored.
4232 "currentStateTime": "A String", # The timestamp associated with the current state.
4233 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
4234 # Flexible resource scheduling jobs are started with some delay after job
4235 # creation, so start_time is unset before start and is updated when the
4236 # job is started by the Cloud Dataflow service. For other jobs, start_time
4237 # always equals to create_time and is immutable and set by the Cloud Dataflow
4238 # service.
4239 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
4240 # Cloud Dataflow service.
4241 "requestedState": "A String", # The job's requested state.
4242 #
4243 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
4244 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
4245 # also be used to directly set a job's requested state to
4246 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
4247 # job if it has not already reached a terminal state.
4248 "name": "A String", # The user-specified Cloud Dataflow job name.
4249 #
4250 # Only one Job with a given name may exist in a project at any
4251 # given time. If a caller attempts to create a Job with the same
4252 # name as an already-existing Job, the attempt returns the
4253 # existing Job.
4254 #
4255 # The name must match the regular expression
4256 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
4257 "steps": [ # Exactly one of step or steps_location should be specified.
4258 #
4259 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004260 { # Defines a particular step within a Cloud Dataflow job.
4261 #
4262 # A job consists of multiple steps, each of which performs some
4263 # specific operation as part of the overall job. Data is typically
4264 # passed from one step to another as part of the job.
4265 #
4266 # Here's an example of a sequence of steps which together implement a
4267 # Map-Reduce job:
4268 #
4269 # * Read a collection of data from some source, parsing the
4270 # collection's elements.
4271 #
4272 # * Validate the elements.
4273 #
4274 # * Apply a user-defined function to map each element to some value
4275 # and extract an element-specific key value.
4276 #
4277 # * Group elements with the same key into a single element with
4278 # that key, transforming a multiply-keyed collection into a
4279 # uniquely-keyed collection.
4280 #
4281 # * Write the elements out to some data sink.
4282 #
4283 # Note that the Cloud Dataflow service may be used to run many different
4284 # types of jobs, not just Map-Reduce.
4285 "kind": "A String", # The kind of step in the Cloud Dataflow job.
4286 "properties": { # Named properties associated with the step. Each kind of
4287 # predefined step has its own required set of properties.
4288 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Takashi Matsuo06694102015-09-11 13:55:40 -07004289 "a_key": "", # Properties of the object.
4290 },
Thomas Coffee2f245372017-03-27 10:39:26 -07004291 "name": "A String", # The name that identifies the step. This must be unique for each
4292 # step with respect to all other steps in the Cloud Dataflow job.
Takashi Matsuo06694102015-09-11 13:55:40 -07004293 },
4294 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07004295 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
4296 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004297 #
Thomas Coffee2f245372017-03-27 10:39:26 -07004298 # When sending a `CreateJobRequest`, you can update a job by specifying it
4299 # here. The job named here is stopped, and its intermediate state is
4300 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004301 "currentState": "A String", # The current state of the job.
4302 #
4303 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
4304 # specified.
4305 #
4306 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
4307 # terminal state. After a job has reached a terminal state, no
4308 # further state updates may be made.
4309 #
4310 # This field may be mutated by the Cloud Dataflow service;
4311 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004312 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
4313 # isn't contained in the submitted job.
Takashi Matsuo06694102015-09-11 13:55:40 -07004314 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04004315 "a_key": { # Contains information about how a particular
4316 # google.dataflow.v1beta3.Step will be executed.
4317 "stepName": [ # The steps associated with the execution stage.
4318 # Note that stages may have several steps, and that a given step
4319 # might be run by more than one stage.
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004320 "A String",
4321 ],
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004322 },
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004323 },
4324 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07004325 }</pre>
Nathaniel Manista4f877e52015-06-15 16:44:50 +00004326</div>
4327
4328</body></html>