blob: 4359e954efe4c0b951344f880eb3c34077426382 [file] [log] [blame]
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.locations.html">locations</a> . <a href="dataflow_v1b3.projects.locations.jobs.html">jobs</a></h1>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080076<h2>Instance Methods</h2>
77<p class="toc_element">
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040078 <code><a href="dataflow_v1b3.projects.locations.jobs.debug.html">debug()</a></code>
79</p>
80<p class="firstline">Returns the debug Resource.</p>
81
82<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080083 <code><a href="dataflow_v1b3.projects.locations.jobs.messages.html">messages()</a></code>
84</p>
85<p class="firstline">Returns the messages Resource.</p>
86
87<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070088 <code><a href="dataflow_v1b3.projects.locations.jobs.snapshots.html">snapshots()</a></code>
89</p>
90<p class="firstline">Returns the snapshots Resource.</p>
91
92<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080093 <code><a href="dataflow_v1b3.projects.locations.jobs.workItems.html">workItems()</a></code>
94</p>
95<p class="firstline">Returns the workItems Resource.</p>
96
97<p class="toc_element">
98 <code><a href="#create">create(projectId, location, body, x__xgafv=None, replaceJobId=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -040099<p class="firstline">Creates a Cloud Dataflow job.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800100<p class="toc_element">
101 <code><a href="#get">get(projectId, location, jobId, x__xgafv=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400102<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800103<p class="toc_element">
104 <code><a href="#getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</a></code></p>
105<p class="firstline">Request the job status.</p>
106<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700107 <code><a href="#list">list(projectId, location, pageSize=None, pageToken=None, x__xgafv=None, filter=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400108<p class="firstline">List the jobs of a project.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800109<p class="toc_element">
110 <code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
111<p class="firstline">Retrieves the next page of results.</p>
112<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700113 <code><a href="#snapshot">snapshot(projectId, location, jobId, body, x__xgafv=None)</a></code></p>
114<p class="firstline">Snapshot the state of a streaming job.</p>
115<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800116 <code><a href="#update">update(projectId, location, jobId, body, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400117<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800118<h3>Method Details</h3>
119<div class="method">
120 <code class="details" id="create">create(projectId, location, body, x__xgafv=None, replaceJobId=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400121 <pre>Creates a Cloud Dataflow job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800122
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700123To create a job, we recommend using `projects.locations.jobs.create` with a
124[regional endpoint]
125(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
126`projects.jobs.create` is not recommended, as your job will always start
127in `us-central1`.
128
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800129Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400130 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700131 location: string, The [regional endpoint]
132(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
133contains this job. (required)
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800134 body: object, The request body. (required)
135 The object takes the form of:
136
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400137{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700138 "labels": { # User-defined labels for this job.
139 #
140 # The labels map can contain no more than 64 entries. Entries of the labels
141 # map are UTF8 strings that comply with the following restrictions:
142 #
143 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
144 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
145 # * Both keys and values are additionally constrained to be <= 128 bytes in
146 # size.
147 "a_key": "A String",
148 },
149 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
150 # by the metadata values provided here. Populated for ListJobs and all GetJob
151 # views SUMMARY and higher.
152 # ListJob response and Job SUMMARY view.
153 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
154 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
155 "version": "A String", # The version of the SDK used to run the job.
156 "sdkSupportStatus": "A String", # The support status for this SDK version.
157 },
158 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
159 { # Metadata for a PubSub connector used by the job.
160 "topic": "A String", # Topic accessed in the connection.
161 "subscription": "A String", # Subscription used in the connection.
162 },
163 ],
164 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
165 { # Metadata for a Datastore connector used by the job.
166 "projectId": "A String", # ProjectId accessed in the connection.
167 "namespace": "A String", # Namespace used in the connection.
168 },
169 ],
170 "fileDetails": [ # Identification of a File source used in the Dataflow job.
171 { # Metadata for a File connector used by the job.
172 "filePattern": "A String", # File Pattern used to access files by the connector.
173 },
174 ],
175 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
176 { # Metadata for a Spanner connector used by the job.
177 "instanceId": "A String", # InstanceId accessed in the connection.
178 "projectId": "A String", # ProjectId accessed in the connection.
179 "databaseId": "A String", # DatabaseId accessed in the connection.
180 },
181 ],
182 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
183 { # Metadata for a BigTable connector used by the job.
184 "instanceId": "A String", # InstanceId accessed in the connection.
185 "projectId": "A String", # ProjectId accessed in the connection.
186 "tableId": "A String", # TableId accessed in the connection.
187 },
188 ],
189 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
190 { # Metadata for a BigQuery connector used by the job.
191 "projectId": "A String", # Project accessed in the connection.
192 "dataset": "A String", # Dataset accessed in the connection.
193 "table": "A String", # Table accessed in the connection.
194 "query": "A String", # Query used to access data in the connection.
195 },
196 ],
197 },
198 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
199 # A description of the user pipeline and stages through which it is executed.
200 # Created by Cloud Dataflow service. Only retrieved with
201 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
202 # form. This data is provided by the Dataflow service for ease of visualizing
203 # the pipeline and interpreting Dataflow provided metrics.
204 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
205 { # Description of the type, names/ids, and input/outputs for a transform.
206 "kind": "A String", # Type of transform.
207 "name": "A String", # User provided name for this transform instance.
208 "inputCollectionName": [ # User names for all collection inputs to this transform.
209 "A String",
210 ],
211 "displayData": [ # Transform-specific display data.
212 { # Data provided with a pipeline or transform to provide descriptive info.
213 "shortStrValue": "A String", # A possible additional shorter value to display.
214 # For example a java_class_name_value of com.mypackage.MyDoFn
215 # will be stored with MyDoFn as the short_str_value and
216 # com.mypackage.MyDoFn as the java_class_name value.
217 # short_str_value can be displayed and java_class_name_value
218 # will be displayed as a tooltip.
219 "durationValue": "A String", # Contains value if the data is of duration type.
220 "url": "A String", # An optional full URL.
221 "floatValue": 3.14, # Contains value if the data is of float type.
222 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
223 # language namespace (i.e. python module) which defines the display data.
224 # This allows a dax monitoring system to specially handle the data
225 # and perform custom rendering.
226 "javaClassValue": "A String", # Contains value if the data is of java class type.
227 "label": "A String", # An optional label to display in a dax UI for the element.
228 "boolValue": True or False, # Contains value if the data is of a boolean type.
229 "strValue": "A String", # Contains value if the data is of string type.
230 "key": "A String", # The key identifying the display data.
231 # This is intended to be used as a label for the display data
232 # when viewed in a dax monitoring system.
233 "int64Value": "A String", # Contains value if the data is of int64 type.
234 "timestampValue": "A String", # Contains value if the data is of timestamp type.
235 },
236 ],
237 "outputCollectionName": [ # User names for all collection outputs to this transform.
238 "A String",
239 ],
240 "id": "A String", # SDK generated id of this transform instance.
241 },
242 ],
243 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
244 { # Description of the composing transforms, names/ids, and input/outputs of a
245 # stage of execution. Some composing transforms and sources may have been
246 # generated by the Dataflow service during execution planning.
247 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
248 { # Description of an interstitial value between transforms in an execution
249 # stage.
250 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
251 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
252 # source is most closely associated.
253 "name": "A String", # Dataflow service generated name for this source.
254 },
255 ],
256 "kind": "A String", # Type of tranform this stage is executing.
257 "name": "A String", # Dataflow service generated name for this stage.
258 "outputSource": [ # Output sources for this stage.
259 { # Description of an input or output of an execution stage.
260 "userName": "A String", # Human-readable name for this source; may be user or system generated.
261 "sizeBytes": "A String", # Size of the source, if measurable.
262 "name": "A String", # Dataflow service generated name for this source.
263 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
264 # source is most closely associated.
265 },
266 ],
267 "inputSource": [ # Input sources for this stage.
268 { # Description of an input or output of an execution stage.
269 "userName": "A String", # Human-readable name for this source; may be user or system generated.
270 "sizeBytes": "A String", # Size of the source, if measurable.
271 "name": "A String", # Dataflow service generated name for this source.
272 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
273 # source is most closely associated.
274 },
275 ],
276 "componentTransform": [ # Transforms that comprise this execution stage.
277 { # Description of a transform executed as part of an execution stage.
278 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
279 "originalTransform": "A String", # User name for the original user transform with which this transform is
280 # most closely associated.
281 "name": "A String", # Dataflow service generated name for this source.
282 },
283 ],
284 "id": "A String", # Dataflow service generated id for this stage.
285 },
286 ],
287 "displayData": [ # Pipeline level display data.
288 { # Data provided with a pipeline or transform to provide descriptive info.
289 "shortStrValue": "A String", # A possible additional shorter value to display.
290 # For example a java_class_name_value of com.mypackage.MyDoFn
291 # will be stored with MyDoFn as the short_str_value and
292 # com.mypackage.MyDoFn as the java_class_name value.
293 # short_str_value can be displayed and java_class_name_value
294 # will be displayed as a tooltip.
295 "durationValue": "A String", # Contains value if the data is of duration type.
296 "url": "A String", # An optional full URL.
297 "floatValue": 3.14, # Contains value if the data is of float type.
298 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
299 # language namespace (i.e. python module) which defines the display data.
300 # This allows a dax monitoring system to specially handle the data
301 # and perform custom rendering.
302 "javaClassValue": "A String", # Contains value if the data is of java class type.
303 "label": "A String", # An optional label to display in a dax UI for the element.
304 "boolValue": True or False, # Contains value if the data is of a boolean type.
305 "strValue": "A String", # Contains value if the data is of string type.
306 "key": "A String", # The key identifying the display data.
307 # This is intended to be used as a label for the display data
308 # when viewed in a dax monitoring system.
309 "int64Value": "A String", # Contains value if the data is of int64 type.
310 "timestampValue": "A String", # Contains value if the data is of timestamp type.
311 },
312 ],
313 },
314 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
315 # callers cannot mutate it.
316 { # A message describing the state of a particular execution stage.
317 "executionStageName": "A String", # The name of the execution stage.
318 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
319 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
320 },
321 ],
322 "id": "A String", # The unique ID of this job.
323 #
324 # This field is set by the Cloud Dataflow service when the Job is
325 # created, and is immutable for the life of the job.
326 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
327 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
328 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
329 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
330 # corresponding name prefixes of the new job.
331 "a_key": "A String",
332 },
333 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
334 "version": { # A structure describing which components and their versions of the service
335 # are required in order to run the job.
336 "a_key": "", # Properties of the object.
337 },
338 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
339 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
340 # at rest, AKA a Customer Managed Encryption Key (CMEK).
341 #
342 # Format:
343 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
344 "internalExperiments": { # Experimental settings.
345 "a_key": "", # Properties of the object. Contains field @type with type URL.
346 },
347 "dataset": "A String", # The dataset for the current project where various workflow
348 # related tables are stored.
349 #
350 # The supported resource type is:
351 #
352 # Google BigQuery:
353 # bigquery.googleapis.com/{dataset}
354 "experiments": [ # The list of experiments to enable.
355 "A String",
356 ],
357 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
358 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
359 # options are passed through the service and are used to recreate the
360 # SDK pipeline options on the worker in a language agnostic and platform
361 # independent way.
362 "a_key": "", # Properties of the object.
363 },
364 "userAgent": { # A description of the process that generated the request.
365 "a_key": "", # Properties of the object.
366 },
367 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
368 # unspecified, the service will attempt to choose a reasonable
369 # default. This should be in the form of the API service name,
370 # e.g. "compute.googleapis.com".
371 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
372 # specified in order for the job to have workers.
373 { # Describes one particular pool of Cloud Dataflow workers to be
374 # instantiated by the Cloud Dataflow service in order to perform the
375 # computations required by a job. Note that a workflow job may use
376 # multiple pools, in order to match the various computational
377 # requirements of the various stages of the job.
378 "diskSourceImage": "A String", # Fully qualified source image for disks.
379 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
380 # using the standard Dataflow task runner. Users should ignore
381 # this field.
382 "workflowFileName": "A String", # The file to store the workflow in.
383 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
384 # will not be uploaded.
385 #
386 # The supported resource type is:
387 #
388 # Google Cloud Storage:
389 # storage.googleapis.com/{bucket}/{object}
390 # bucket.storage.googleapis.com/{object}
391 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
392 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
393 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
394 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
395 # "shuffle/v1beta1".
396 "workerId": "A String", # The ID of the worker running this pipeline.
397 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
398 #
399 # When workers access Google Cloud APIs, they logically do so via
400 # relative URLs. If this field is specified, it supplies the base
401 # URL to use for resolving these relative URLs. The normative
402 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
403 # Locators".
404 #
405 # If not specified, the default value is "http://www.googleapis.com/"
406 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
407 # "dataflow/v1b3/projects".
408 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
409 # storage.
410 #
411 # The supported resource type is:
412 #
413 # Google Cloud Storage:
414 #
415 # storage.googleapis.com/{bucket}/{object}
416 # bucket.storage.googleapis.com/{object}
417 },
418 "vmId": "A String", # The ID string of the VM.
419 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
420 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
421 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
422 # access the Cloud Dataflow API.
423 "A String",
424 ],
425 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
426 # taskrunner; e.g. "root".
427 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
428 #
429 # When workers access Google Cloud APIs, they logically do so via
430 # relative URLs. If this field is specified, it supplies the base
431 # URL to use for resolving these relative URLs. The normative
432 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
433 # Locators".
434 #
435 # If not specified, the default value is "http://www.googleapis.com/"
436 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
437 # taskrunner; e.g. "wheel".
438 "languageHint": "A String", # The suggested backend language.
439 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
440 # console.
441 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
442 "logDir": "A String", # The directory on the VM to store logs.
443 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
444 "harnessCommand": "A String", # The command to launch the worker harness.
445 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
446 # temporary storage.
447 #
448 # The supported resource type is:
449 #
450 # Google Cloud Storage:
451 # storage.googleapis.com/{bucket}/{object}
452 # bucket.storage.googleapis.com/{object}
453 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
454 },
455 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
456 # are supported.
457 "packages": [ # Packages to be installed on workers.
458 { # The packages that must be installed in order for a worker to run the
459 # steps of the Cloud Dataflow job that will be assigned to its worker
460 # pool.
461 #
462 # This is the mechanism by which the Cloud Dataflow SDK causes code to
463 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
464 # might use this to install jars containing the user's code and all of the
465 # various dependencies (libraries, data files, etc.) required in order
466 # for that code to run.
467 "location": "A String", # The resource to read the package from. The supported resource type is:
468 #
469 # Google Cloud Storage:
470 #
471 # storage.googleapis.com/{bucket}
472 # bucket.storage.googleapis.com/
473 "name": "A String", # The name of the package.
474 },
475 ],
476 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
477 # service will attempt to choose a reasonable default.
478 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
479 # the service will use the network "default".
480 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
481 # will attempt to choose a reasonable default.
482 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
483 # attempt to choose a reasonable default.
484 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
485 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
486 # `TEARDOWN_NEVER`.
487 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
488 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
489 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
490 # down.
491 #
492 # If the workers are not torn down by the service, they will
493 # continue to run and use Google Compute Engine VM resources in the
494 # user's project until they are explicitly terminated by the user.
495 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
496 # policy except for small, manually supervised test jobs.
497 #
498 # If unknown or unspecified, the service will attempt to choose a reasonable
499 # default.
500 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
501 # Compute Engine API.
502 "ipConfiguration": "A String", # Configuration for VM IPs.
503 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
504 # service will choose a number of threads (according to the number of cores
505 # on the selected machine type for batch, or 1 by convention for streaming).
506 "poolArgs": { # Extra arguments for this worker pool.
507 "a_key": "", # Properties of the object. Contains field @type with type URL.
508 },
509 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
510 # execute the job. If zero or unspecified, the service will
511 # attempt to choose a reasonable default.
512 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
513 # harness, residing in Google Container Registry.
514 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
515 # the form "regions/REGION/subnetworks/SUBNETWORK".
516 "dataDisks": [ # Data disks that are used by a VM in this workflow.
517 { # Describes the data disk used by a workflow job.
518 "mountPoint": "A String", # Directory in a VM where disk is mounted.
519 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
520 # attempt to choose a reasonable default.
521 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
522 # must be a disk type appropriate to the project and zone in which
523 # the workers will run. If unknown or unspecified, the service
524 # will attempt to choose a reasonable default.
525 #
526 # For example, the standard persistent disk type is a resource name
527 # typically ending in "pd-standard". If SSD persistent disks are
528 # available, the resource name typically ends with "pd-ssd". The
529 # actual valid values are defined the Google Compute Engine API,
530 # not by the Cloud Dataflow API; consult the Google Compute Engine
531 # documentation for more information about determining the set of
532 # available disk types for a particular project and zone.
533 #
534 # Google Compute Engine Disk types are local to a particular
535 # project in a particular zone, and so the resource name will
536 # typically look something like this:
537 #
538 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
539 },
540 ],
541 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
542 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
543 "algorithm": "A String", # The algorithm to use for autoscaling.
544 },
545 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
546 # select a default set of packages which are useful to worker
547 # harnesses written in a particular language.
548 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
549 # attempt to choose a reasonable default.
550 "metadata": { # Metadata to set on the Google Compute Engine VMs.
551 "a_key": "A String",
552 },
553 },
554 ],
555 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
556 # storage. The system will append the suffix "/temp-{JOBNAME} to
557 # this resource prefix, where {JOBNAME} is the value of the
558 # job_name field. The resulting bucket and object prefix is used
559 # as the prefix of the resources used to store temporary data
560 # needed during the job execution. NOTE: This will override the
561 # value in taskrunner_settings.
562 # The supported resource type is:
563 #
564 # Google Cloud Storage:
565 #
566 # storage.googleapis.com/{bucket}/{object}
567 # bucket.storage.googleapis.com/{object}
568 },
569 "location": "A String", # The [regional endpoint]
570 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
571 # contains this job.
572 "tempFiles": [ # A set of files the system should be aware of that are used
573 # for temporary storage. These temporary files will be
574 # removed on job completion.
575 # No duplicates are allowed.
576 # No file patterns are supported.
577 #
578 # The supported files are:
579 #
580 # Google Cloud Storage:
581 #
582 # storage.googleapis.com/{bucket}/{object}
583 # bucket.storage.googleapis.com/{object}
584 "A String",
585 ],
586 "type": "A String", # The type of Cloud Dataflow job.
587 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
588 # If this field is set, the service will ensure its uniqueness.
589 # The request to create a job will fail if the service has knowledge of a
590 # previously submitted job with the same client's ID and job name.
591 # The caller may use this field to ensure idempotence of job
592 # creation across retried attempts to create a job.
593 # By default, the field is empty and, in that case, the service ignores it.
594 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
595 # snapshot.
596 "stepsLocation": "A String", # The GCS location where the steps are stored.
597 "currentStateTime": "A String", # The timestamp associated with the current state.
598 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
599 # Flexible resource scheduling jobs are started with some delay after job
600 # creation, so start_time is unset before start and is updated when the
601 # job is started by the Cloud Dataflow service. For other jobs, start_time
602 # always equals to create_time and is immutable and set by the Cloud Dataflow
603 # service.
604 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
605 # Cloud Dataflow service.
606 "requestedState": "A String", # The job's requested state.
607 #
608 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
609 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
610 # also be used to directly set a job's requested state to
611 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
612 # job if it has not already reached a terminal state.
613 "name": "A String", # The user-specified Cloud Dataflow job name.
614 #
615 # Only one Job with a given name may exist in a project at any
616 # given time. If a caller attempts to create a Job with the same
617 # name as an already-existing Job, the attempt returns the
618 # existing Job.
619 #
620 # The name must match the regular expression
621 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
622 "steps": [ # Exactly one of step or steps_location should be specified.
623 #
624 # The top-level steps that constitute the entire job.
625 { # Defines a particular step within a Cloud Dataflow job.
626 #
627 # A job consists of multiple steps, each of which performs some
628 # specific operation as part of the overall job. Data is typically
629 # passed from one step to another as part of the job.
630 #
631 # Here's an example of a sequence of steps which together implement a
632 # Map-Reduce job:
633 #
634 # * Read a collection of data from some source, parsing the
635 # collection's elements.
636 #
637 # * Validate the elements.
638 #
639 # * Apply a user-defined function to map each element to some value
640 # and extract an element-specific key value.
641 #
642 # * Group elements with the same key into a single element with
643 # that key, transforming a multiply-keyed collection into a
644 # uniquely-keyed collection.
645 #
646 # * Write the elements out to some data sink.
647 #
648 # Note that the Cloud Dataflow service may be used to run many different
649 # types of jobs, not just Map-Reduce.
650 "kind": "A String", # The kind of step in the Cloud Dataflow job.
651 "properties": { # Named properties associated with the step. Each kind of
652 # predefined step has its own required set of properties.
653 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
654 "a_key": "", # Properties of the object.
655 },
656 "name": "A String", # The name that identifies the step. This must be unique for each
657 # step with respect to all other steps in the Cloud Dataflow job.
658 },
659 ],
660 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
661 # of the job it replaced.
662 #
663 # When sending a `CreateJobRequest`, you can update a job by specifying it
664 # here. The job named here is stopped, and its intermediate state is
665 # transferred to this job.
666 "currentState": "A String", # The current state of the job.
667 #
668 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
669 # specified.
670 #
671 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
672 # terminal state. After a job has reached a terminal state, no
673 # further state updates may be made.
674 #
675 # This field may be mutated by the Cloud Dataflow service;
676 # callers cannot mutate it.
677 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
678 # isn't contained in the submitted job.
679 "stages": { # A mapping from each stage to the information about that stage.
680 "a_key": { # Contains information about how a particular
681 # google.dataflow.v1beta3.Step will be executed.
682 "stepName": [ # The steps associated with the execution stage.
683 # Note that stages may have several steps, and that a given step
684 # might be run by more than one stage.
685 "A String",
686 ],
687 },
688 },
689 },
690}
691
692 x__xgafv: string, V1 error format.
693 Allowed values
694 1 - v1 error format
695 2 - v2 error format
696 replaceJobId: string, Deprecated. This field is now in the Job message.
697 view: string, The level of information requested in response.
698
699Returns:
700 An object of the form:
701
702 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400703 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700704 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400705 # The labels map can contain no more than 64 entries. Entries of the labels
706 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700707 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400708 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
709 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
710 # * Both keys and values are additionally constrained to be <= 128 bytes in
711 # size.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800712 "a_key": "A String",
713 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700714 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
715 # by the metadata values provided here. Populated for ListJobs and all GetJob
716 # views SUMMARY and higher.
717 # ListJob response and Job SUMMARY view.
718 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
719 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
720 "version": "A String", # The version of the SDK used to run the job.
721 "sdkSupportStatus": "A String", # The support status for this SDK version.
722 },
723 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
724 { # Metadata for a PubSub connector used by the job.
725 "topic": "A String", # Topic accessed in the connection.
726 "subscription": "A String", # Subscription used in the connection.
727 },
728 ],
729 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
730 { # Metadata for a Datastore connector used by the job.
731 "projectId": "A String", # ProjectId accessed in the connection.
732 "namespace": "A String", # Namespace used in the connection.
733 },
734 ],
735 "fileDetails": [ # Identification of a File source used in the Dataflow job.
736 { # Metadata for a File connector used by the job.
737 "filePattern": "A String", # File Pattern used to access files by the connector.
738 },
739 ],
740 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
741 { # Metadata for a Spanner connector used by the job.
742 "instanceId": "A String", # InstanceId accessed in the connection.
743 "projectId": "A String", # ProjectId accessed in the connection.
744 "databaseId": "A String", # DatabaseId accessed in the connection.
745 },
746 ],
747 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
748 { # Metadata for a BigTable connector used by the job.
749 "instanceId": "A String", # InstanceId accessed in the connection.
750 "projectId": "A String", # ProjectId accessed in the connection.
751 "tableId": "A String", # TableId accessed in the connection.
752 },
753 ],
754 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
755 { # Metadata for a BigQuery connector used by the job.
756 "projectId": "A String", # Project accessed in the connection.
757 "dataset": "A String", # Dataset accessed in the connection.
758 "table": "A String", # Table accessed in the connection.
759 "query": "A String", # Query used to access data in the connection.
760 },
761 ],
762 },
763 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
764 # A description of the user pipeline and stages through which it is executed.
765 # Created by Cloud Dataflow service. Only retrieved with
766 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
767 # form. This data is provided by the Dataflow service for ease of visualizing
768 # the pipeline and interpreting Dataflow provided metrics.
769 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
770 { # Description of the type, names/ids, and input/outputs for a transform.
771 "kind": "A String", # Type of transform.
772 "name": "A String", # User provided name for this transform instance.
773 "inputCollectionName": [ # User names for all collection inputs to this transform.
774 "A String",
775 ],
776 "displayData": [ # Transform-specific display data.
777 { # Data provided with a pipeline or transform to provide descriptive info.
778 "shortStrValue": "A String", # A possible additional shorter value to display.
779 # For example a java_class_name_value of com.mypackage.MyDoFn
780 # will be stored with MyDoFn as the short_str_value and
781 # com.mypackage.MyDoFn as the java_class_name value.
782 # short_str_value can be displayed and java_class_name_value
783 # will be displayed as a tooltip.
784 "durationValue": "A String", # Contains value if the data is of duration type.
785 "url": "A String", # An optional full URL.
786 "floatValue": 3.14, # Contains value if the data is of float type.
787 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
788 # language namespace (i.e. python module) which defines the display data.
789 # This allows a dax monitoring system to specially handle the data
790 # and perform custom rendering.
791 "javaClassValue": "A String", # Contains value if the data is of java class type.
792 "label": "A String", # An optional label to display in a dax UI for the element.
793 "boolValue": True or False, # Contains value if the data is of a boolean type.
794 "strValue": "A String", # Contains value if the data is of string type.
795 "key": "A String", # The key identifying the display data.
796 # This is intended to be used as a label for the display data
797 # when viewed in a dax monitoring system.
798 "int64Value": "A String", # Contains value if the data is of int64 type.
799 "timestampValue": "A String", # Contains value if the data is of timestamp type.
800 },
801 ],
802 "outputCollectionName": [ # User names for all collection outputs to this transform.
803 "A String",
804 ],
805 "id": "A String", # SDK generated id of this transform instance.
806 },
807 ],
808 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
809 { # Description of the composing transforms, names/ids, and input/outputs of a
810 # stage of execution. Some composing transforms and sources may have been
811 # generated by the Dataflow service during execution planning.
812 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
813 { # Description of an interstitial value between transforms in an execution
814 # stage.
815 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
816 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
817 # source is most closely associated.
818 "name": "A String", # Dataflow service generated name for this source.
819 },
820 ],
821 "kind": "A String", # Type of tranform this stage is executing.
822 "name": "A String", # Dataflow service generated name for this stage.
823 "outputSource": [ # Output sources for this stage.
824 { # Description of an input or output of an execution stage.
825 "userName": "A String", # Human-readable name for this source; may be user or system generated.
826 "sizeBytes": "A String", # Size of the source, if measurable.
827 "name": "A String", # Dataflow service generated name for this source.
828 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
829 # source is most closely associated.
830 },
831 ],
832 "inputSource": [ # Input sources for this stage.
833 { # Description of an input or output of an execution stage.
834 "userName": "A String", # Human-readable name for this source; may be user or system generated.
835 "sizeBytes": "A String", # Size of the source, if measurable.
836 "name": "A String", # Dataflow service generated name for this source.
837 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
838 # source is most closely associated.
839 },
840 ],
841 "componentTransform": [ # Transforms that comprise this execution stage.
842 { # Description of a transform executed as part of an execution stage.
843 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
844 "originalTransform": "A String", # User name for the original user transform with which this transform is
845 # most closely associated.
846 "name": "A String", # Dataflow service generated name for this source.
847 },
848 ],
849 "id": "A String", # Dataflow service generated id for this stage.
850 },
851 ],
852 "displayData": [ # Pipeline level display data.
853 { # Data provided with a pipeline or transform to provide descriptive info.
854 "shortStrValue": "A String", # A possible additional shorter value to display.
855 # For example a java_class_name_value of com.mypackage.MyDoFn
856 # will be stored with MyDoFn as the short_str_value and
857 # com.mypackage.MyDoFn as the java_class_name value.
858 # short_str_value can be displayed and java_class_name_value
859 # will be displayed as a tooltip.
860 "durationValue": "A String", # Contains value if the data is of duration type.
861 "url": "A String", # An optional full URL.
862 "floatValue": 3.14, # Contains value if the data is of float type.
863 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
864 # language namespace (i.e. python module) which defines the display data.
865 # This allows a dax monitoring system to specially handle the data
866 # and perform custom rendering.
867 "javaClassValue": "A String", # Contains value if the data is of java class type.
868 "label": "A String", # An optional label to display in a dax UI for the element.
869 "boolValue": True or False, # Contains value if the data is of a boolean type.
870 "strValue": "A String", # Contains value if the data is of string type.
871 "key": "A String", # The key identifying the display data.
872 # This is intended to be used as a label for the display data
873 # when viewed in a dax monitoring system.
874 "int64Value": "A String", # Contains value if the data is of int64 type.
875 "timestampValue": "A String", # Contains value if the data is of timestamp type.
876 },
877 ],
878 },
879 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
880 # callers cannot mutate it.
881 { # A message describing the state of a particular execution stage.
882 "executionStageName": "A String", # The name of the execution stage.
883 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
884 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
885 },
886 ],
887 "id": "A String", # The unique ID of this job.
888 #
889 # This field is set by the Cloud Dataflow service when the Job is
890 # created, and is immutable for the life of the job.
891 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
892 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
893 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400894 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
895 # corresponding name prefixes of the new job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800896 "a_key": "A String",
897 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400898 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
899 "version": { # A structure describing which components and their versions of the service
900 # are required in order to run the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800901 "a_key": "", # Properties of the object.
902 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700903 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
904 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
905 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400906 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700907 # Format:
908 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800909 "internalExperiments": { # Experimental settings.
910 "a_key": "", # Properties of the object. Contains field @type with type URL.
911 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400912 "dataset": "A String", # The dataset for the current project where various workflow
913 # related tables are stored.
914 #
915 # The supported resource type is:
916 #
917 # Google BigQuery:
918 # bigquery.googleapis.com/{dataset}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800919 "experiments": [ # The list of experiments to enable.
920 "A String",
921 ],
922 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400923 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
924 # options are passed through the service and are used to recreate the
925 # SDK pipeline options on the worker in a language agnostic and platform
926 # independent way.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800927 "a_key": "", # Properties of the object.
928 },
929 "userAgent": { # A description of the process that generated the request.
930 "a_key": "", # Properties of the object.
931 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400932 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
933 # unspecified, the service will attempt to choose a reasonable
934 # default. This should be in the form of the API service name,
935 # e.g. "compute.googleapis.com".
936 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
937 # specified in order for the job to have workers.
938 { # Describes one particular pool of Cloud Dataflow workers to be
939 # instantiated by the Cloud Dataflow service in order to perform the
940 # computations required by a job. Note that a workflow job may use
941 # multiple pools, in order to match the various computational
942 # requirements of the various stages of the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800943 "diskSourceImage": "A String", # Fully qualified source image for disks.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400944 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
945 # using the standard Dataflow task runner. Users should ignore
946 # this field.
947 "workflowFileName": "A String", # The file to store the workflow in.
948 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
949 # will not be uploaded.
950 #
951 # The supported resource type is:
952 #
953 # Google Cloud Storage:
954 # storage.googleapis.com/{bucket}/{object}
955 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -0700956 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400957 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
958 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
959 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
960 # "shuffle/v1beta1".
961 "workerId": "A String", # The ID of the worker running this pipeline.
962 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
963 #
964 # When workers access Google Cloud APIs, they logically do so via
965 # relative URLs. If this field is specified, it supplies the base
966 # URL to use for resolving these relative URLs. The normative
967 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
968 # Locators".
969 #
970 # If not specified, the default value is "http://www.googleapis.com/"
971 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
972 # "dataflow/v1b3/projects".
973 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
974 # storage.
975 #
976 # The supported resource type is:
977 #
978 # Google Cloud Storage:
979 #
980 # storage.googleapis.com/{bucket}/{object}
981 # bucket.storage.googleapis.com/{object}
982 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -0400983 "vmId": "A String", # The ID string of the VM.
984 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
985 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
986 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
987 # access the Cloud Dataflow API.
988 "A String",
989 ],
990 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
991 # taskrunner; e.g. "root".
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400992 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
993 #
994 # When workers access Google Cloud APIs, they logically do so via
995 # relative URLs. If this field is specified, it supplies the base
996 # URL to use for resolving these relative URLs. The normative
997 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
998 # Locators".
999 #
1000 # If not specified, the default value is "http://www.googleapis.com/"
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001001 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1002 # taskrunner; e.g. "wheel".
1003 "languageHint": "A String", # The suggested backend language.
1004 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1005 # console.
1006 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1007 "logDir": "A String", # The directory on the VM to store logs.
1008 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001009 "harnessCommand": "A String", # The command to launch the worker harness.
1010 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1011 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001012 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001013 # The supported resource type is:
1014 #
1015 # Google Cloud Storage:
1016 # storage.googleapis.com/{bucket}/{object}
1017 # bucket.storage.googleapis.com/{object}
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001018 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001019 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001020 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1021 # are supported.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001022 "packages": [ # Packages to be installed on workers.
1023 { # The packages that must be installed in order for a worker to run the
1024 # steps of the Cloud Dataflow job that will be assigned to its worker
1025 # pool.
1026 #
1027 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1028 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1029 # might use this to install jars containing the user's code and all of the
1030 # various dependencies (libraries, data files, etc.) required in order
1031 # for that code to run.
1032 "location": "A String", # The resource to read the package from. The supported resource type is:
1033 #
1034 # Google Cloud Storage:
1035 #
1036 # storage.googleapis.com/{bucket}
1037 # bucket.storage.googleapis.com/
1038 "name": "A String", # The name of the package.
1039 },
1040 ],
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001041 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1042 # service will attempt to choose a reasonable default.
1043 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1044 # the service will use the network "default".
1045 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1046 # will attempt to choose a reasonable default.
1047 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1048 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001049 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1050 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1051 # `TEARDOWN_NEVER`.
1052 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1053 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1054 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1055 # down.
1056 #
1057 # If the workers are not torn down by the service, they will
1058 # continue to run and use Google Compute Engine VM resources in the
1059 # user's project until they are explicitly terminated by the user.
1060 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1061 # policy except for small, manually supervised test jobs.
1062 #
1063 # If unknown or unspecified, the service will attempt to choose a reasonable
1064 # default.
1065 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1066 # Compute Engine API.
1067 "ipConfiguration": "A String", # Configuration for VM IPs.
1068 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1069 # service will choose a number of threads (according to the number of cores
1070 # on the selected machine type for batch, or 1 by convention for streaming).
1071 "poolArgs": { # Extra arguments for this worker pool.
1072 "a_key": "", # Properties of the object. Contains field @type with type URL.
1073 },
1074 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1075 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001076 # attempt to choose a reasonable default.
1077 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1078 # harness, residing in Google Container Registry.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001079 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1080 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001081 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1082 { # Describes the data disk used by a workflow job.
1083 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1084 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1085 # attempt to choose a reasonable default.
1086 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1087 # must be a disk type appropriate to the project and zone in which
1088 # the workers will run. If unknown or unspecified, the service
1089 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001090 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001091 # For example, the standard persistent disk type is a resource name
1092 # typically ending in "pd-standard". If SSD persistent disks are
1093 # available, the resource name typically ends with "pd-ssd". The
1094 # actual valid values are defined the Google Compute Engine API,
1095 # not by the Cloud Dataflow API; consult the Google Compute Engine
1096 # documentation for more information about determining the set of
1097 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001098 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001099 # Google Compute Engine Disk types are local to a particular
1100 # project in a particular zone, and so the resource name will
1101 # typically look something like this:
1102 #
1103 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001104 },
1105 ],
1106 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1107 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1108 "algorithm": "A String", # The algorithm to use for autoscaling.
1109 },
1110 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1111 # select a default set of packages which are useful to worker
1112 # harnesses written in a particular language.
1113 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
1114 # attempt to choose a reasonable default.
1115 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1116 "a_key": "A String",
1117 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001118 },
1119 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001120 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1121 # storage. The system will append the suffix "/temp-{JOBNAME} to
1122 # this resource prefix, where {JOBNAME} is the value of the
1123 # job_name field. The resulting bucket and object prefix is used
1124 # as the prefix of the resources used to store temporary data
1125 # needed during the job execution. NOTE: This will override the
1126 # value in taskrunner_settings.
1127 # The supported resource type is:
1128 #
1129 # Google Cloud Storage:
1130 #
1131 # storage.googleapis.com/{bucket}/{object}
1132 # bucket.storage.googleapis.com/{object}
1133 },
1134 "location": "A String", # The [regional endpoint]
1135 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1136 # contains this job.
1137 "tempFiles": [ # A set of files the system should be aware of that are used
1138 # for temporary storage. These temporary files will be
1139 # removed on job completion.
1140 # No duplicates are allowed.
1141 # No file patterns are supported.
1142 #
1143 # The supported files are:
1144 #
1145 # Google Cloud Storage:
1146 #
1147 # storage.googleapis.com/{bucket}/{object}
1148 # bucket.storage.googleapis.com/{object}
1149 "A String",
1150 ],
1151 "type": "A String", # The type of Cloud Dataflow job.
1152 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1153 # If this field is set, the service will ensure its uniqueness.
1154 # The request to create a job will fail if the service has knowledge of a
1155 # previously submitted job with the same client's ID and job name.
1156 # The caller may use this field to ensure idempotence of job
1157 # creation across retried attempts to create a job.
1158 # By default, the field is empty and, in that case, the service ignores it.
1159 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1160 # snapshot.
1161 "stepsLocation": "A String", # The GCS location where the steps are stored.
1162 "currentStateTime": "A String", # The timestamp associated with the current state.
1163 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1164 # Flexible resource scheduling jobs are started with some delay after job
1165 # creation, so start_time is unset before start and is updated when the
1166 # job is started by the Cloud Dataflow service. For other jobs, start_time
1167 # always equals to create_time and is immutable and set by the Cloud Dataflow
1168 # service.
1169 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1170 # Cloud Dataflow service.
1171 "requestedState": "A String", # The job's requested state.
1172 #
1173 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1174 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1175 # also be used to directly set a job's requested state to
1176 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1177 # job if it has not already reached a terminal state.
1178 "name": "A String", # The user-specified Cloud Dataflow job name.
1179 #
1180 # Only one Job with a given name may exist in a project at any
1181 # given time. If a caller attempts to create a Job with the same
1182 # name as an already-existing Job, the attempt returns the
1183 # existing Job.
1184 #
1185 # The name must match the regular expression
1186 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1187 "steps": [ # Exactly one of step or steps_location should be specified.
1188 #
1189 # The top-level steps that constitute the entire job.
1190 { # Defines a particular step within a Cloud Dataflow job.
1191 #
1192 # A job consists of multiple steps, each of which performs some
1193 # specific operation as part of the overall job. Data is typically
1194 # passed from one step to another as part of the job.
1195 #
1196 # Here's an example of a sequence of steps which together implement a
1197 # Map-Reduce job:
1198 #
1199 # * Read a collection of data from some source, parsing the
1200 # collection's elements.
1201 #
1202 # * Validate the elements.
1203 #
1204 # * Apply a user-defined function to map each element to some value
1205 # and extract an element-specific key value.
1206 #
1207 # * Group elements with the same key into a single element with
1208 # that key, transforming a multiply-keyed collection into a
1209 # uniquely-keyed collection.
1210 #
1211 # * Write the elements out to some data sink.
1212 #
1213 # Note that the Cloud Dataflow service may be used to run many different
1214 # types of jobs, not just Map-Reduce.
1215 "kind": "A String", # The kind of step in the Cloud Dataflow job.
1216 "properties": { # Named properties associated with the step. Each kind of
1217 # predefined step has its own required set of properties.
1218 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1219 "a_key": "", # Properties of the object.
1220 },
1221 "name": "A String", # The name that identifies the step. This must be unique for each
1222 # step with respect to all other steps in the Cloud Dataflow job.
1223 },
1224 ],
1225 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1226 # of the job it replaced.
1227 #
1228 # When sending a `CreateJobRequest`, you can update a job by specifying it
1229 # here. The job named here is stopped, and its intermediate state is
1230 # transferred to this job.
1231 "currentState": "A String", # The current state of the job.
1232 #
1233 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1234 # specified.
1235 #
1236 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1237 # terminal state. After a job has reached a terminal state, no
1238 # further state updates may be made.
1239 #
1240 # This field may be mutated by the Cloud Dataflow service;
1241 # callers cannot mutate it.
1242 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1243 # isn't contained in the submitted job.
1244 "stages": { # A mapping from each stage to the information about that stage.
1245 "a_key": { # Contains information about how a particular
1246 # google.dataflow.v1beta3.Step will be executed.
1247 "stepName": [ # The steps associated with the execution stage.
1248 # Note that stages may have several steps, and that a given step
1249 # might be run by more than one stage.
1250 "A String",
1251 ],
1252 },
1253 },
1254 },
1255 }</pre>
1256</div>
1257
1258<div class="method">
1259 <code class="details" id="get">get(projectId, location, jobId, x__xgafv=None, view=None)</code>
1260 <pre>Gets the state of the specified Cloud Dataflow job.
1261
1262To get the state of a job, we recommend using `projects.locations.jobs.get`
1263with a [regional endpoint]
1264(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1265`projects.jobs.get` is not recommended, as you can only get the state of
1266jobs that are running in `us-central1`.
1267
1268Args:
1269 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
1270 location: string, The [regional endpoint]
1271(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1272contains this job. (required)
1273 jobId: string, The job ID. (required)
1274 x__xgafv: string, V1 error format.
1275 Allowed values
1276 1 - v1 error format
1277 2 - v2 error format
1278 view: string, The level of information requested in response.
1279
1280Returns:
1281 An object of the form:
1282
1283 { # Defines a job to be run by the Cloud Dataflow service.
1284 "labels": { # User-defined labels for this job.
1285 #
1286 # The labels map can contain no more than 64 entries. Entries of the labels
1287 # map are UTF8 strings that comply with the following restrictions:
1288 #
1289 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1290 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1291 # * Both keys and values are additionally constrained to be <= 128 bytes in
1292 # size.
1293 "a_key": "A String",
1294 },
1295 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1296 # by the metadata values provided here. Populated for ListJobs and all GetJob
1297 # views SUMMARY and higher.
1298 # ListJob response and Job SUMMARY view.
1299 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1300 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1301 "version": "A String", # The version of the SDK used to run the job.
1302 "sdkSupportStatus": "A String", # The support status for this SDK version.
1303 },
1304 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1305 { # Metadata for a PubSub connector used by the job.
1306 "topic": "A String", # Topic accessed in the connection.
1307 "subscription": "A String", # Subscription used in the connection.
1308 },
1309 ],
1310 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
1311 { # Metadata for a Datastore connector used by the job.
1312 "projectId": "A String", # ProjectId accessed in the connection.
1313 "namespace": "A String", # Namespace used in the connection.
1314 },
1315 ],
1316 "fileDetails": [ # Identification of a File source used in the Dataflow job.
1317 { # Metadata for a File connector used by the job.
1318 "filePattern": "A String", # File Pattern used to access files by the connector.
1319 },
1320 ],
1321 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
1322 { # Metadata for a Spanner connector used by the job.
1323 "instanceId": "A String", # InstanceId accessed in the connection.
1324 "projectId": "A String", # ProjectId accessed in the connection.
1325 "databaseId": "A String", # DatabaseId accessed in the connection.
1326 },
1327 ],
1328 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
1329 { # Metadata for a BigTable connector used by the job.
1330 "instanceId": "A String", # InstanceId accessed in the connection.
1331 "projectId": "A String", # ProjectId accessed in the connection.
1332 "tableId": "A String", # TableId accessed in the connection.
1333 },
1334 ],
1335 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
1336 { # Metadata for a BigQuery connector used by the job.
1337 "projectId": "A String", # Project accessed in the connection.
1338 "dataset": "A String", # Dataset accessed in the connection.
1339 "table": "A String", # Table accessed in the connection.
1340 "query": "A String", # Query used to access data in the connection.
1341 },
1342 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001343 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001344 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1345 # A description of the user pipeline and stages through which it is executed.
1346 # Created by Cloud Dataflow service. Only retrieved with
1347 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1348 # form. This data is provided by the Dataflow service for ease of visualizing
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001349 # the pipeline and interpreting Dataflow provided metrics.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001350 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
1351 { # Description of the type, names/ids, and input/outputs for a transform.
1352 "kind": "A String", # Type of transform.
1353 "name": "A String", # User provided name for this transform instance.
1354 "inputCollectionName": [ # User names for all collection inputs to this transform.
1355 "A String",
1356 ],
1357 "displayData": [ # Transform-specific display data.
1358 { # Data provided with a pipeline or transform to provide descriptive info.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001359 "shortStrValue": "A String", # A possible additional shorter value to display.
1360 # For example a java_class_name_value of com.mypackage.MyDoFn
1361 # will be stored with MyDoFn as the short_str_value and
1362 # com.mypackage.MyDoFn as the java_class_name value.
1363 # short_str_value can be displayed and java_class_name_value
1364 # will be displayed as a tooltip.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001365 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001366 "url": "A String", # An optional full URL.
1367 "floatValue": 3.14, # Contains value if the data is of float type.
1368 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1369 # language namespace (i.e. python module) which defines the display data.
1370 # This allows a dax monitoring system to specially handle the data
1371 # and perform custom rendering.
1372 "javaClassValue": "A String", # Contains value if the data is of java class type.
1373 "label": "A String", # An optional label to display in a dax UI for the element.
1374 "boolValue": True or False, # Contains value if the data is of a boolean type.
1375 "strValue": "A String", # Contains value if the data is of string type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001376 "key": "A String", # The key identifying the display data.
1377 # This is intended to be used as a label for the display data
1378 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001379 "int64Value": "A String", # Contains value if the data is of int64 type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001380 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001381 },
1382 ],
1383 "outputCollectionName": [ # User names for all collection outputs to this transform.
1384 "A String",
1385 ],
1386 "id": "A String", # SDK generated id of this transform instance.
1387 },
1388 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001389 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
1390 { # Description of the composing transforms, names/ids, and input/outputs of a
1391 # stage of execution. Some composing transforms and sources may have been
1392 # generated by the Dataflow service during execution planning.
1393 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
1394 { # Description of an interstitial value between transforms in an execution
1395 # stage.
1396 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1397 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1398 # source is most closely associated.
1399 "name": "A String", # Dataflow service generated name for this source.
1400 },
1401 ],
1402 "kind": "A String", # Type of tranform this stage is executing.
1403 "name": "A String", # Dataflow service generated name for this stage.
1404 "outputSource": [ # Output sources for this stage.
1405 { # Description of an input or output of an execution stage.
1406 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1407 "sizeBytes": "A String", # Size of the source, if measurable.
1408 "name": "A String", # Dataflow service generated name for this source.
1409 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1410 # source is most closely associated.
1411 },
1412 ],
1413 "inputSource": [ # Input sources for this stage.
1414 { # Description of an input or output of an execution stage.
1415 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1416 "sizeBytes": "A String", # Size of the source, if measurable.
1417 "name": "A String", # Dataflow service generated name for this source.
1418 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1419 # source is most closely associated.
1420 },
1421 ],
1422 "componentTransform": [ # Transforms that comprise this execution stage.
1423 { # Description of a transform executed as part of an execution stage.
1424 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1425 "originalTransform": "A String", # User name for the original user transform with which this transform is
1426 # most closely associated.
1427 "name": "A String", # Dataflow service generated name for this source.
1428 },
1429 ],
1430 "id": "A String", # Dataflow service generated id for this stage.
1431 },
1432 ],
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001433 "displayData": [ # Pipeline level display data.
1434 { # Data provided with a pipeline or transform to provide descriptive info.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001435 "shortStrValue": "A String", # A possible additional shorter value to display.
1436 # For example a java_class_name_value of com.mypackage.MyDoFn
1437 # will be stored with MyDoFn as the short_str_value and
1438 # com.mypackage.MyDoFn as the java_class_name value.
1439 # short_str_value can be displayed and java_class_name_value
1440 # will be displayed as a tooltip.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001441 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001442 "url": "A String", # An optional full URL.
1443 "floatValue": 3.14, # Contains value if the data is of float type.
1444 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1445 # language namespace (i.e. python module) which defines the display data.
1446 # This allows a dax monitoring system to specially handle the data
1447 # and perform custom rendering.
1448 "javaClassValue": "A String", # Contains value if the data is of java class type.
1449 "label": "A String", # An optional label to display in a dax UI for the element.
1450 "boolValue": True or False, # Contains value if the data is of a boolean type.
1451 "strValue": "A String", # Contains value if the data is of string type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001452 "key": "A String", # The key identifying the display data.
1453 # This is intended to be used as a label for the display data
1454 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001455 "int64Value": "A String", # Contains value if the data is of int64 type.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001456 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001457 },
1458 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001459 },
1460 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
1461 # callers cannot mutate it.
1462 { # A message describing the state of a particular execution stage.
1463 "executionStageName": "A String", # The name of the execution stage.
1464 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
1465 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
1466 },
1467 ],
1468 "id": "A String", # The unique ID of this job.
1469 #
1470 # This field is set by the Cloud Dataflow service when the Job is
1471 # created, and is immutable for the life of the job.
1472 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
1473 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1474 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
1475 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
1476 # corresponding name prefixes of the new job.
1477 "a_key": "A String",
1478 },
1479 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
1480 "version": { # A structure describing which components and their versions of the service
1481 # are required in order to run the job.
1482 "a_key": "", # Properties of the object.
1483 },
1484 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
1485 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
1486 # at rest, AKA a Customer Managed Encryption Key (CMEK).
1487 #
1488 # Format:
1489 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
1490 "internalExperiments": { # Experimental settings.
1491 "a_key": "", # Properties of the object. Contains field @type with type URL.
1492 },
1493 "dataset": "A String", # The dataset for the current project where various workflow
1494 # related tables are stored.
1495 #
1496 # The supported resource type is:
1497 #
1498 # Google BigQuery:
1499 # bigquery.googleapis.com/{dataset}
1500 "experiments": [ # The list of experiments to enable.
1501 "A String",
1502 ],
1503 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
1504 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
1505 # options are passed through the service and are used to recreate the
1506 # SDK pipeline options on the worker in a language agnostic and platform
1507 # independent way.
1508 "a_key": "", # Properties of the object.
1509 },
1510 "userAgent": { # A description of the process that generated the request.
1511 "a_key": "", # Properties of the object.
1512 },
1513 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
1514 # unspecified, the service will attempt to choose a reasonable
1515 # default. This should be in the form of the API service name,
1516 # e.g. "compute.googleapis.com".
1517 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
1518 # specified in order for the job to have workers.
1519 { # Describes one particular pool of Cloud Dataflow workers to be
1520 # instantiated by the Cloud Dataflow service in order to perform the
1521 # computations required by a job. Note that a workflow job may use
1522 # multiple pools, in order to match the various computational
1523 # requirements of the various stages of the job.
1524 "diskSourceImage": "A String", # Fully qualified source image for disks.
1525 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1526 # using the standard Dataflow task runner. Users should ignore
1527 # this field.
1528 "workflowFileName": "A String", # The file to store the workflow in.
1529 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
1530 # will not be uploaded.
1531 #
1532 # The supported resource type is:
1533 #
1534 # Google Cloud Storage:
1535 # storage.googleapis.com/{bucket}/{object}
1536 # bucket.storage.googleapis.com/{object}
1537 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
1538 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1539 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1540 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1541 # "shuffle/v1beta1".
1542 "workerId": "A String", # The ID of the worker running this pipeline.
1543 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1544 #
1545 # When workers access Google Cloud APIs, they logically do so via
1546 # relative URLs. If this field is specified, it supplies the base
1547 # URL to use for resolving these relative URLs. The normative
1548 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1549 # Locators".
1550 #
1551 # If not specified, the default value is "http://www.googleapis.com/"
1552 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1553 # "dataflow/v1b3/projects".
1554 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1555 # storage.
1556 #
1557 # The supported resource type is:
1558 #
1559 # Google Cloud Storage:
1560 #
1561 # storage.googleapis.com/{bucket}/{object}
1562 # bucket.storage.googleapis.com/{object}
1563 },
1564 "vmId": "A String", # The ID string of the VM.
1565 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1566 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1567 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1568 # access the Cloud Dataflow API.
1569 "A String",
1570 ],
1571 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1572 # taskrunner; e.g. "root".
1573 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1574 #
1575 # When workers access Google Cloud APIs, they logically do so via
1576 # relative URLs. If this field is specified, it supplies the base
1577 # URL to use for resolving these relative URLs. The normative
1578 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1579 # Locators".
1580 #
1581 # If not specified, the default value is "http://www.googleapis.com/"
1582 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1583 # taskrunner; e.g. "wheel".
1584 "languageHint": "A String", # The suggested backend language.
1585 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1586 # console.
1587 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1588 "logDir": "A String", # The directory on the VM to store logs.
1589 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
1590 "harnessCommand": "A String", # The command to launch the worker harness.
1591 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1592 # temporary storage.
1593 #
1594 # The supported resource type is:
1595 #
1596 # Google Cloud Storage:
1597 # storage.googleapis.com/{bucket}/{object}
1598 # bucket.storage.googleapis.com/{object}
1599 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
1600 },
1601 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1602 # are supported.
1603 "packages": [ # Packages to be installed on workers.
1604 { # The packages that must be installed in order for a worker to run the
1605 # steps of the Cloud Dataflow job that will be assigned to its worker
1606 # pool.
1607 #
1608 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1609 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1610 # might use this to install jars containing the user's code and all of the
1611 # various dependencies (libraries, data files, etc.) required in order
1612 # for that code to run.
1613 "location": "A String", # The resource to read the package from. The supported resource type is:
1614 #
1615 # Google Cloud Storage:
1616 #
1617 # storage.googleapis.com/{bucket}
1618 # bucket.storage.googleapis.com/
1619 "name": "A String", # The name of the package.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001620 },
1621 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001622 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1623 # service will attempt to choose a reasonable default.
1624 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1625 # the service will use the network "default".
1626 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1627 # will attempt to choose a reasonable default.
1628 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1629 # attempt to choose a reasonable default.
1630 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1631 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1632 # `TEARDOWN_NEVER`.
1633 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1634 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1635 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1636 # down.
1637 #
1638 # If the workers are not torn down by the service, they will
1639 # continue to run and use Google Compute Engine VM resources in the
1640 # user's project until they are explicitly terminated by the user.
1641 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1642 # policy except for small, manually supervised test jobs.
1643 #
1644 # If unknown or unspecified, the service will attempt to choose a reasonable
1645 # default.
1646 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1647 # Compute Engine API.
1648 "ipConfiguration": "A String", # Configuration for VM IPs.
1649 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1650 # service will choose a number of threads (according to the number of cores
1651 # on the selected machine type for batch, or 1 by convention for streaming).
1652 "poolArgs": { # Extra arguments for this worker pool.
1653 "a_key": "", # Properties of the object. Contains field @type with type URL.
1654 },
1655 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1656 # execute the job. If zero or unspecified, the service will
1657 # attempt to choose a reasonable default.
1658 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1659 # harness, residing in Google Container Registry.
1660 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1661 # the form "regions/REGION/subnetworks/SUBNETWORK".
1662 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1663 { # Describes the data disk used by a workflow job.
1664 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1665 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1666 # attempt to choose a reasonable default.
1667 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1668 # must be a disk type appropriate to the project and zone in which
1669 # the workers will run. If unknown or unspecified, the service
1670 # will attempt to choose a reasonable default.
1671 #
1672 # For example, the standard persistent disk type is a resource name
1673 # typically ending in "pd-standard". If SSD persistent disks are
1674 # available, the resource name typically ends with "pd-ssd". The
1675 # actual valid values are defined the Google Compute Engine API,
1676 # not by the Cloud Dataflow API; consult the Google Compute Engine
1677 # documentation for more information about determining the set of
1678 # available disk types for a particular project and zone.
1679 #
1680 # Google Compute Engine Disk types are local to a particular
1681 # project in a particular zone, and so the resource name will
1682 # typically look something like this:
1683 #
1684 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001685 },
1686 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001687 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1688 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1689 "algorithm": "A String", # The algorithm to use for autoscaling.
1690 },
1691 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1692 # select a default set of packages which are useful to worker
1693 # harnesses written in a particular language.
1694 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
1695 # attempt to choose a reasonable default.
1696 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1697 "a_key": "A String",
1698 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001699 },
1700 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001701 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1702 # storage. The system will append the suffix "/temp-{JOBNAME} to
1703 # this resource prefix, where {JOBNAME} is the value of the
1704 # job_name field. The resulting bucket and object prefix is used
1705 # as the prefix of the resources used to store temporary data
1706 # needed during the job execution. NOTE: This will override the
1707 # value in taskrunner_settings.
1708 # The supported resource type is:
1709 #
1710 # Google Cloud Storage:
1711 #
1712 # storage.googleapis.com/{bucket}/{object}
1713 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001714 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001715 "location": "A String", # The [regional endpoint]
1716 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1717 # contains this job.
1718 "tempFiles": [ # A set of files the system should be aware of that are used
1719 # for temporary storage. These temporary files will be
1720 # removed on job completion.
1721 # No duplicates are allowed.
1722 # No file patterns are supported.
1723 #
1724 # The supported files are:
1725 #
1726 # Google Cloud Storage:
1727 #
1728 # storage.googleapis.com/{bucket}/{object}
1729 # bucket.storage.googleapis.com/{object}
1730 "A String",
1731 ],
1732 "type": "A String", # The type of Cloud Dataflow job.
1733 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1734 # If this field is set, the service will ensure its uniqueness.
1735 # The request to create a job will fail if the service has knowledge of a
1736 # previously submitted job with the same client's ID and job name.
1737 # The caller may use this field to ensure idempotence of job
1738 # creation across retried attempts to create a job.
1739 # By default, the field is empty and, in that case, the service ignores it.
1740 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1741 # snapshot.
1742 "stepsLocation": "A String", # The GCS location where the steps are stored.
1743 "currentStateTime": "A String", # The timestamp associated with the current state.
1744 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1745 # Flexible resource scheduling jobs are started with some delay after job
1746 # creation, so start_time is unset before start and is updated when the
1747 # job is started by the Cloud Dataflow service. For other jobs, start_time
1748 # always equals to create_time and is immutable and set by the Cloud Dataflow
1749 # service.
1750 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1751 # Cloud Dataflow service.
1752 "requestedState": "A String", # The job's requested state.
1753 #
1754 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1755 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1756 # also be used to directly set a job's requested state to
1757 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1758 # job if it has not already reached a terminal state.
1759 "name": "A String", # The user-specified Cloud Dataflow job name.
1760 #
1761 # Only one Job with a given name may exist in a project at any
1762 # given time. If a caller attempts to create a Job with the same
1763 # name as an already-existing Job, the attempt returns the
1764 # existing Job.
1765 #
1766 # The name must match the regular expression
1767 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1768 "steps": [ # Exactly one of step or steps_location should be specified.
1769 #
1770 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001771 { # Defines a particular step within a Cloud Dataflow job.
1772 #
1773 # A job consists of multiple steps, each of which performs some
1774 # specific operation as part of the overall job. Data is typically
1775 # passed from one step to another as part of the job.
1776 #
1777 # Here's an example of a sequence of steps which together implement a
1778 # Map-Reduce job:
1779 #
1780 # * Read a collection of data from some source, parsing the
1781 # collection's elements.
1782 #
1783 # * Validate the elements.
1784 #
1785 # * Apply a user-defined function to map each element to some value
1786 # and extract an element-specific key value.
1787 #
1788 # * Group elements with the same key into a single element with
1789 # that key, transforming a multiply-keyed collection into a
1790 # uniquely-keyed collection.
1791 #
1792 # * Write the elements out to some data sink.
1793 #
1794 # Note that the Cloud Dataflow service may be used to run many different
1795 # types of jobs, not just Map-Reduce.
1796 "kind": "A String", # The kind of step in the Cloud Dataflow job.
1797 "properties": { # Named properties associated with the step. Each kind of
1798 # predefined step has its own required set of properties.
1799 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001800 "a_key": "", # Properties of the object.
1801 },
Thomas Coffee2f245372017-03-27 10:39:26 -07001802 "name": "A String", # The name that identifies the step. This must be unique for each
1803 # step with respect to all other steps in the Cloud Dataflow job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001804 },
1805 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07001806 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1807 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001808 #
Thomas Coffee2f245372017-03-27 10:39:26 -07001809 # When sending a `CreateJobRequest`, you can update a job by specifying it
1810 # here. The job named here is stopped, and its intermediate state is
1811 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001812 "currentState": "A String", # The current state of the job.
1813 #
1814 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1815 # specified.
1816 #
1817 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1818 # terminal state. After a job has reached a terminal state, no
1819 # further state updates may be made.
1820 #
1821 # This field may be mutated by the Cloud Dataflow service;
1822 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001823 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1824 # isn't contained in the submitted job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001825 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001826 "a_key": { # Contains information about how a particular
1827 # google.dataflow.v1beta3.Step will be executed.
1828 "stepName": [ # The steps associated with the execution stage.
1829 # Note that stages may have several steps, and that a given step
1830 # might be run by more than one stage.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001831 "A String",
1832 ],
1833 },
1834 },
1835 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001836 }</pre>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001837</div>
1838
1839<div class="method">
1840 <code class="details" id="getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</code>
1841 <pre>Request the job status.
1842
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001843To request the status of a job, we recommend using
1844`projects.locations.jobs.getMetrics` with a [regional endpoint]
1845(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1846`projects.jobs.getMetrics` is not recommended, as you can only request the
1847status of jobs that are running in `us-central1`.
1848
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001849Args:
1850 projectId: string, A project id. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001851 location: string, The [regional endpoint]
1852(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1853contains the job specified by job_id. (required)
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001854 jobId: string, The job to get messages for. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001855 startTime: string, Return only metric data that has changed since this time.
1856Default is to return all information about all metrics for the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001857 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001858 Allowed values
1859 1 - v1 error format
1860 2 - v2 error format
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001861
1862Returns:
1863 An object of the form:
1864
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001865 { # JobMetrics contains a collection of metrics describing the detailed progress
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001866 # of a Dataflow job. Metrics correspond to user-defined and system-defined
1867 # metrics in the job.
1868 #
1869 # This resource captures only the most recent values of each metric;
1870 # time-series data can be queried for them (under the same metric names)
1871 # from Cloud Monitoring.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001872 "metrics": [ # All metrics for this job.
1873 { # Describes the state of a metric.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001874 "meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
1875 # This holds the count of the aggregated values and is used in combination
1876 # with mean_sum above to obtain the actual mean aggregate value.
1877 # The only possible value type is Long.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001878 "kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are
1879 # "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".
1880 # The specified aggregation kind is case-insensitive.
1881 #
1882 # If omitted, this is not an aggregated value but instead
1883 # a single metric sample value.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001884 "set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only
1885 # possible value type is a list of Values whose type can be Long, Double,
1886 # or String, according to the metric's type. All Values in the list must
1887 # be of the same type.
1888 "name": { # Identifies a metric, by describing the source which generated the # Name of the metric.
1889 # metric.
1890 "origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;
1891 # will be "dataflow" for metrics defined by the Dataflow service or SDK.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001892 "name": "A String", # Worker-defined metric name.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001893 "context": { # Zero or more labeled fields which identify the part of the job this
1894 # metric is associated with, such as the name of a step or collection.
1895 #
1896 # For example, built-in counters associated with steps will have
1897 # context['step'] = <step-name>. Counters associated with PCollections
1898 # in the SDK will have context['pcollection'] = <pcollection-name>.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001899 "a_key": "A String",
1900 },
1901 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001902 "meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
1903 # This holds the sum of the aggregated values and is used in combination
1904 # with mean_count below to obtain the actual mean aggregate value.
1905 # The only possible value types are Long and Double.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001906 "cumulative": True or False, # True if this metric is reported as the total cumulative aggregate
1907 # value accumulated since the worker started working on this WorkItem.
1908 # By default this is false, indicating that this metric is reported
1909 # as a delta that is not associated with any WorkItem.
1910 "updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are
1911 # reporting work progress; it will be filled in responses from the
1912 # metrics API.
1913 "scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",
1914 # "And", and "Or". The possible value types are Long, Double, and Boolean.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001915 "internal": "", # Worker-computed aggregate value for internal use by the Dataflow
1916 # service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001917 "gauge": "", # A struct value describing properties of a Gauge.
1918 # Metrics of gauge type show the value of a metric across time, and is
1919 # aggregated based on the newest value.
1920 "distribution": "", # A struct value describing properties of a distribution of numeric values.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001921 },
1922 ],
1923 "metricTime": "A String", # Timestamp as of which metric values are current.
1924 }</pre>
1925</div>
1926
1927<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001928 <code class="details" id="list">list(projectId, location, pageSize=None, pageToken=None, x__xgafv=None, filter=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001929 <pre>List the jobs of a project.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001930
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001931To list the jobs of a project in a region, we recommend using
1932`projects.locations.jobs.get` with a [regional endpoint]
1933(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
1934list the all jobs across all regions, use `projects.jobs.aggregated`. Using
1935`projects.jobs.list` is not recommended, as you can only get the list of
1936jobs that are running in `us-central1`.
1937
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001938Args:
1939 projectId: string, The project which owns the jobs. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001940 location: string, The [regional endpoint]
1941(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1942contains this job. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001943 pageSize: integer, If there are many jobs, limit response to at most this many.
1944The actual number of jobs returned will be the lesser of max_responses
1945and an unspecified server-defined limit.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001946 pageToken: string, Set this to the 'next_page_token' field of a previous response
1947to request additional results in a long list.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001948 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001949 Allowed values
1950 1 - v1 error format
1951 2 - v2 error format
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001952 filter: string, The kind of filter to use.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001953 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001954
1955Returns:
1956 An object of the form:
1957
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001958 { # Response to a request to list Cloud Dataflow jobs. This may be a partial
1959 # response, depending on the page size in the ListJobsRequest.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001960 "nextPageToken": "A String", # Set if there may be more results than fit in this response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001961 "failedLocation": [ # Zero or more messages describing the [regional endpoints]
1962 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1963 # failed to respond.
1964 { # Indicates which [regional endpoint]
1965 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
1966 # to respond to a request for data.
1967 "name": "A String", # The name of the [regional endpoint]
1968 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1969 # failed to respond.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001970 },
1971 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001972 "jobs": [ # A subset of the requested job information.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001973 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001974 "labels": { # User-defined labels for this job.
1975 #
1976 # The labels map can contain no more than 64 entries. Entries of the labels
1977 # map are UTF8 strings that comply with the following restrictions:
1978 #
1979 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1980 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
1981 # * Both keys and values are additionally constrained to be <= 128 bytes in
1982 # size.
1983 "a_key": "A String",
1984 },
1985 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1986 # by the metadata values provided here. Populated for ListJobs and all GetJob
1987 # views SUMMARY and higher.
1988 # ListJob response and Job SUMMARY view.
1989 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1990 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1991 "version": "A String", # The version of the SDK used to run the job.
1992 "sdkSupportStatus": "A String", # The support status for this SDK version.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001993 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001994 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1995 { # Metadata for a PubSub connector used by the job.
1996 "topic": "A String", # Topic accessed in the connection.
1997 "subscription": "A String", # Subscription used in the connection.
1998 },
1999 ],
2000 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2001 { # Metadata for a Datastore connector used by the job.
2002 "projectId": "A String", # ProjectId accessed in the connection.
2003 "namespace": "A String", # Namespace used in the connection.
2004 },
2005 ],
2006 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2007 { # Metadata for a File connector used by the job.
2008 "filePattern": "A String", # File Pattern used to access files by the connector.
2009 },
2010 ],
2011 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2012 { # Metadata for a Spanner connector used by the job.
2013 "instanceId": "A String", # InstanceId accessed in the connection.
2014 "projectId": "A String", # ProjectId accessed in the connection.
2015 "databaseId": "A String", # DatabaseId accessed in the connection.
2016 },
2017 ],
2018 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2019 { # Metadata for a BigTable connector used by the job.
2020 "instanceId": "A String", # InstanceId accessed in the connection.
2021 "projectId": "A String", # ProjectId accessed in the connection.
2022 "tableId": "A String", # TableId accessed in the connection.
2023 },
2024 ],
2025 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2026 { # Metadata for a BigQuery connector used by the job.
2027 "projectId": "A String", # Project accessed in the connection.
2028 "dataset": "A String", # Dataset accessed in the connection.
2029 "table": "A String", # Table accessed in the connection.
2030 "query": "A String", # Query used to access data in the connection.
2031 },
2032 ],
2033 },
2034 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2035 # A description of the user pipeline and stages through which it is executed.
2036 # Created by Cloud Dataflow service. Only retrieved with
2037 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2038 # form. This data is provided by the Dataflow service for ease of visualizing
2039 # the pipeline and interpreting Dataflow provided metrics.
2040 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2041 { # Description of the type, names/ids, and input/outputs for a transform.
2042 "kind": "A String", # Type of transform.
2043 "name": "A String", # User provided name for this transform instance.
2044 "inputCollectionName": [ # User names for all collection inputs to this transform.
2045 "A String",
2046 ],
2047 "displayData": [ # Transform-specific display data.
2048 { # Data provided with a pipeline or transform to provide descriptive info.
2049 "shortStrValue": "A String", # A possible additional shorter value to display.
2050 # For example a java_class_name_value of com.mypackage.MyDoFn
2051 # will be stored with MyDoFn as the short_str_value and
2052 # com.mypackage.MyDoFn as the java_class_name value.
2053 # short_str_value can be displayed and java_class_name_value
2054 # will be displayed as a tooltip.
2055 "durationValue": "A String", # Contains value if the data is of duration type.
2056 "url": "A String", # An optional full URL.
2057 "floatValue": 3.14, # Contains value if the data is of float type.
2058 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2059 # language namespace (i.e. python module) which defines the display data.
2060 # This allows a dax monitoring system to specially handle the data
2061 # and perform custom rendering.
2062 "javaClassValue": "A String", # Contains value if the data is of java class type.
2063 "label": "A String", # An optional label to display in a dax UI for the element.
2064 "boolValue": True or False, # Contains value if the data is of a boolean type.
2065 "strValue": "A String", # Contains value if the data is of string type.
2066 "key": "A String", # The key identifying the display data.
2067 # This is intended to be used as a label for the display data
2068 # when viewed in a dax monitoring system.
2069 "int64Value": "A String", # Contains value if the data is of int64 type.
2070 "timestampValue": "A String", # Contains value if the data is of timestamp type.
2071 },
2072 ],
2073 "outputCollectionName": [ # User names for all collection outputs to this transform.
2074 "A String",
2075 ],
2076 "id": "A String", # SDK generated id of this transform instance.
2077 },
2078 ],
2079 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2080 { # Description of the composing transforms, names/ids, and input/outputs of a
2081 # stage of execution. Some composing transforms and sources may have been
2082 # generated by the Dataflow service during execution planning.
2083 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2084 { # Description of an interstitial value between transforms in an execution
2085 # stage.
2086 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2087 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2088 # source is most closely associated.
2089 "name": "A String", # Dataflow service generated name for this source.
2090 },
2091 ],
2092 "kind": "A String", # Type of tranform this stage is executing.
2093 "name": "A String", # Dataflow service generated name for this stage.
2094 "outputSource": [ # Output sources for this stage.
2095 { # Description of an input or output of an execution stage.
2096 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2097 "sizeBytes": "A String", # Size of the source, if measurable.
2098 "name": "A String", # Dataflow service generated name for this source.
2099 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2100 # source is most closely associated.
2101 },
2102 ],
2103 "inputSource": [ # Input sources for this stage.
2104 { # Description of an input or output of an execution stage.
2105 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2106 "sizeBytes": "A String", # Size of the source, if measurable.
2107 "name": "A String", # Dataflow service generated name for this source.
2108 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2109 # source is most closely associated.
2110 },
2111 ],
2112 "componentTransform": [ # Transforms that comprise this execution stage.
2113 { # Description of a transform executed as part of an execution stage.
2114 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2115 "originalTransform": "A String", # User name for the original user transform with which this transform is
2116 # most closely associated.
2117 "name": "A String", # Dataflow service generated name for this source.
2118 },
2119 ],
2120 "id": "A String", # Dataflow service generated id for this stage.
2121 },
2122 ],
2123 "displayData": [ # Pipeline level display data.
2124 { # Data provided with a pipeline or transform to provide descriptive info.
2125 "shortStrValue": "A String", # A possible additional shorter value to display.
2126 # For example a java_class_name_value of com.mypackage.MyDoFn
2127 # will be stored with MyDoFn as the short_str_value and
2128 # com.mypackage.MyDoFn as the java_class_name value.
2129 # short_str_value can be displayed and java_class_name_value
2130 # will be displayed as a tooltip.
2131 "durationValue": "A String", # Contains value if the data is of duration type.
2132 "url": "A String", # An optional full URL.
2133 "floatValue": 3.14, # Contains value if the data is of float type.
2134 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2135 # language namespace (i.e. python module) which defines the display data.
2136 # This allows a dax monitoring system to specially handle the data
2137 # and perform custom rendering.
2138 "javaClassValue": "A String", # Contains value if the data is of java class type.
2139 "label": "A String", # An optional label to display in a dax UI for the element.
2140 "boolValue": True or False, # Contains value if the data is of a boolean type.
2141 "strValue": "A String", # Contains value if the data is of string type.
2142 "key": "A String", # The key identifying the display data.
2143 # This is intended to be used as a label for the display data
2144 # when viewed in a dax monitoring system.
2145 "int64Value": "A String", # Contains value if the data is of int64 type.
2146 "timestampValue": "A String", # Contains value if the data is of timestamp type.
2147 },
2148 ],
2149 },
2150 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2151 # callers cannot mutate it.
2152 { # A message describing the state of a particular execution stage.
2153 "executionStageName": "A String", # The name of the execution stage.
2154 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2155 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002156 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002157 ],
2158 "id": "A String", # The unique ID of this job.
2159 #
2160 # This field is set by the Cloud Dataflow service when the Job is
2161 # created, and is immutable for the life of the job.
2162 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2163 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2164 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2165 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2166 # corresponding name prefixes of the new job.
2167 "a_key": "A String",
2168 },
2169 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2170 "version": { # A structure describing which components and their versions of the service
2171 # are required in order to run the job.
2172 "a_key": "", # Properties of the object.
2173 },
2174 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2175 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2176 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002177 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002178 # Format:
2179 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2180 "internalExperiments": { # Experimental settings.
2181 "a_key": "", # Properties of the object. Contains field @type with type URL.
2182 },
2183 "dataset": "A String", # The dataset for the current project where various workflow
2184 # related tables are stored.
2185 #
2186 # The supported resource type is:
2187 #
2188 # Google BigQuery:
2189 # bigquery.googleapis.com/{dataset}
2190 "experiments": [ # The list of experiments to enable.
2191 "A String",
2192 ],
2193 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2194 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2195 # options are passed through the service and are used to recreate the
2196 # SDK pipeline options on the worker in a language agnostic and platform
2197 # independent way.
2198 "a_key": "", # Properties of the object.
2199 },
2200 "userAgent": { # A description of the process that generated the request.
2201 "a_key": "", # Properties of the object.
2202 },
2203 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
2204 # unspecified, the service will attempt to choose a reasonable
2205 # default. This should be in the form of the API service name,
2206 # e.g. "compute.googleapis.com".
2207 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2208 # specified in order for the job to have workers.
2209 { # Describes one particular pool of Cloud Dataflow workers to be
2210 # instantiated by the Cloud Dataflow service in order to perform the
2211 # computations required by a job. Note that a workflow job may use
2212 # multiple pools, in order to match the various computational
2213 # requirements of the various stages of the job.
2214 "diskSourceImage": "A String", # Fully qualified source image for disks.
2215 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2216 # using the standard Dataflow task runner. Users should ignore
2217 # this field.
2218 "workflowFileName": "A String", # The file to store the workflow in.
2219 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2220 # will not be uploaded.
2221 #
2222 # The supported resource type is:
2223 #
2224 # Google Cloud Storage:
2225 # storage.googleapis.com/{bucket}/{object}
2226 # bucket.storage.googleapis.com/{object}
2227 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
2228 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2229 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2230 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2231 # "shuffle/v1beta1".
2232 "workerId": "A String", # The ID of the worker running this pipeline.
2233 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002234 #
2235 # When workers access Google Cloud APIs, they logically do so via
2236 # relative URLs. If this field is specified, it supplies the base
2237 # URL to use for resolving these relative URLs. The normative
2238 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2239 # Locators".
2240 #
2241 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002242 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
2243 # "dataflow/v1b3/projects".
2244 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2245 # storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002246 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07002247 # The supported resource type is:
2248 #
2249 # Google Cloud Storage:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002250 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07002251 # storage.googleapis.com/{bucket}/{object}
2252 # bucket.storage.googleapis.com/{object}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002253 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002254 "vmId": "A String", # The ID string of the VM.
2255 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2256 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2257 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2258 # access the Cloud Dataflow API.
2259 "A String",
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002260 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002261 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2262 # taskrunner; e.g. "root".
2263 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002264 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002265 # When workers access Google Cloud APIs, they logically do so via
2266 # relative URLs. If this field is specified, it supplies the base
2267 # URL to use for resolving these relative URLs. The normative
2268 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2269 # Locators".
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002270 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002271 # If not specified, the default value is "http://www.googleapis.com/"
2272 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2273 # taskrunner; e.g. "wheel".
2274 "languageHint": "A String", # The suggested backend language.
2275 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2276 # console.
2277 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2278 "logDir": "A String", # The directory on the VM to store logs.
2279 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
2280 "harnessCommand": "A String", # The command to launch the worker harness.
2281 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
2282 # temporary storage.
2283 #
2284 # The supported resource type is:
2285 #
2286 # Google Cloud Storage:
2287 # storage.googleapis.com/{bucket}/{object}
2288 # bucket.storage.googleapis.com/{object}
2289 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2290 },
2291 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
2292 # are supported.
2293 "packages": [ # Packages to be installed on workers.
2294 { # The packages that must be installed in order for a worker to run the
2295 # steps of the Cloud Dataflow job that will be assigned to its worker
2296 # pool.
2297 #
2298 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2299 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2300 # might use this to install jars containing the user's code and all of the
2301 # various dependencies (libraries, data files, etc.) required in order
2302 # for that code to run.
2303 "location": "A String", # The resource to read the package from. The supported resource type is:
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002304 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002305 # Google Cloud Storage:
2306 #
2307 # storage.googleapis.com/{bucket}
2308 # bucket.storage.googleapis.com/
2309 "name": "A String", # The name of the package.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002310 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002311 ],
2312 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2313 # service will attempt to choose a reasonable default.
2314 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2315 # the service will use the network "default".
2316 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2317 # will attempt to choose a reasonable default.
2318 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2319 # attempt to choose a reasonable default.
2320 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
2321 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2322 # `TEARDOWN_NEVER`.
2323 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2324 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2325 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2326 # down.
2327 #
2328 # If the workers are not torn down by the service, they will
2329 # continue to run and use Google Compute Engine VM resources in the
2330 # user's project until they are explicitly terminated by the user.
2331 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2332 # policy except for small, manually supervised test jobs.
2333 #
2334 # If unknown or unspecified, the service will attempt to choose a reasonable
2335 # default.
2336 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2337 # Compute Engine API.
2338 "ipConfiguration": "A String", # Configuration for VM IPs.
2339 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
2340 # service will choose a number of threads (according to the number of cores
2341 # on the selected machine type for batch, or 1 by convention for streaming).
2342 "poolArgs": { # Extra arguments for this worker pool.
2343 "a_key": "", # Properties of the object. Contains field @type with type URL.
2344 },
2345 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
2346 # execute the job. If zero or unspecified, the service will
2347 # attempt to choose a reasonable default.
2348 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2349 # harness, residing in Google Container Registry.
2350 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2351 # the form "regions/REGION/subnetworks/SUBNETWORK".
2352 "dataDisks": [ # Data disks that are used by a VM in this workflow.
2353 { # Describes the data disk used by a workflow job.
2354 "mountPoint": "A String", # Directory in a VM where disk is mounted.
2355 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
2356 # attempt to choose a reasonable default.
2357 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
2358 # must be a disk type appropriate to the project and zone in which
2359 # the workers will run. If unknown or unspecified, the service
2360 # will attempt to choose a reasonable default.
2361 #
2362 # For example, the standard persistent disk type is a resource name
2363 # typically ending in "pd-standard". If SSD persistent disks are
2364 # available, the resource name typically ends with "pd-ssd". The
2365 # actual valid values are defined the Google Compute Engine API,
2366 # not by the Cloud Dataflow API; consult the Google Compute Engine
2367 # documentation for more information about determining the set of
2368 # available disk types for a particular project and zone.
2369 #
2370 # Google Compute Engine Disk types are local to a particular
2371 # project in a particular zone, and so the resource name will
2372 # typically look something like this:
2373 #
2374 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002375 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002376 ],
2377 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2378 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2379 "algorithm": "A String", # The algorithm to use for autoscaling.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002380 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002381 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
2382 # select a default set of packages which are useful to worker
2383 # harnesses written in a particular language.
2384 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
2385 # attempt to choose a reasonable default.
2386 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2387 "a_key": "A String",
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002388 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002389 },
2390 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002391 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2392 # storage. The system will append the suffix "/temp-{JOBNAME} to
2393 # this resource prefix, where {JOBNAME} is the value of the
2394 # job_name field. The resulting bucket and object prefix is used
2395 # as the prefix of the resources used to store temporary data
2396 # needed during the job execution. NOTE: This will override the
2397 # value in taskrunner_settings.
2398 # The supported resource type is:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002399 #
2400 # Google Cloud Storage:
2401 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002402 # storage.googleapis.com/{bucket}/{object}
2403 # bucket.storage.googleapis.com/{object}
2404 },
2405 "location": "A String", # The [regional endpoint]
2406 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2407 # contains this job.
2408 "tempFiles": [ # A set of files the system should be aware of that are used
2409 # for temporary storage. These temporary files will be
2410 # removed on job completion.
2411 # No duplicates are allowed.
2412 # No file patterns are supported.
2413 #
2414 # The supported files are:
2415 #
2416 # Google Cloud Storage:
2417 #
2418 # storage.googleapis.com/{bucket}/{object}
2419 # bucket.storage.googleapis.com/{object}
2420 "A String",
2421 ],
2422 "type": "A String", # The type of Cloud Dataflow job.
2423 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
2424 # If this field is set, the service will ensure its uniqueness.
2425 # The request to create a job will fail if the service has knowledge of a
2426 # previously submitted job with the same client's ID and job name.
2427 # The caller may use this field to ensure idempotence of job
2428 # creation across retried attempts to create a job.
2429 # By default, the field is empty and, in that case, the service ignores it.
2430 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
2431 # snapshot.
2432 "stepsLocation": "A String", # The GCS location where the steps are stored.
2433 "currentStateTime": "A String", # The timestamp associated with the current state.
2434 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
2435 # Flexible resource scheduling jobs are started with some delay after job
2436 # creation, so start_time is unset before start and is updated when the
2437 # job is started by the Cloud Dataflow service. For other jobs, start_time
2438 # always equals to create_time and is immutable and set by the Cloud Dataflow
2439 # service.
2440 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
2441 # Cloud Dataflow service.
2442 "requestedState": "A String", # The job's requested state.
2443 #
2444 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
2445 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
2446 # also be used to directly set a job's requested state to
2447 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
2448 # job if it has not already reached a terminal state.
2449 "name": "A String", # The user-specified Cloud Dataflow job name.
2450 #
2451 # Only one Job with a given name may exist in a project at any
2452 # given time. If a caller attempts to create a Job with the same
2453 # name as an already-existing Job, the attempt returns the
2454 # existing Job.
2455 #
2456 # The name must match the regular expression
2457 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
2458 "steps": [ # Exactly one of step or steps_location should be specified.
2459 #
2460 # The top-level steps that constitute the entire job.
2461 { # Defines a particular step within a Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002462 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002463 # A job consists of multiple steps, each of which performs some
2464 # specific operation as part of the overall job. Data is typically
2465 # passed from one step to another as part of the job.
2466 #
2467 # Here's an example of a sequence of steps which together implement a
2468 # Map-Reduce job:
2469 #
2470 # * Read a collection of data from some source, parsing the
2471 # collection's elements.
2472 #
2473 # * Validate the elements.
2474 #
2475 # * Apply a user-defined function to map each element to some value
2476 # and extract an element-specific key value.
2477 #
2478 # * Group elements with the same key into a single element with
2479 # that key, transforming a multiply-keyed collection into a
2480 # uniquely-keyed collection.
2481 #
2482 # * Write the elements out to some data sink.
2483 #
2484 # Note that the Cloud Dataflow service may be used to run many different
2485 # types of jobs, not just Map-Reduce.
2486 "kind": "A String", # The kind of step in the Cloud Dataflow job.
2487 "properties": { # Named properties associated with the step. Each kind of
2488 # predefined step has its own required set of properties.
2489 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
2490 "a_key": "", # Properties of the object.
2491 },
2492 "name": "A String", # The name that identifies the step. This must be unique for each
2493 # step with respect to all other steps in the Cloud Dataflow job.
2494 },
2495 ],
2496 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
2497 # of the job it replaced.
2498 #
2499 # When sending a `CreateJobRequest`, you can update a job by specifying it
2500 # here. The job named here is stopped, and its intermediate state is
2501 # transferred to this job.
2502 "currentState": "A String", # The current state of the job.
2503 #
2504 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
2505 # specified.
2506 #
2507 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
2508 # terminal state. After a job has reached a terminal state, no
2509 # further state updates may be made.
2510 #
2511 # This field may be mutated by the Cloud Dataflow service;
2512 # callers cannot mutate it.
2513 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
2514 # isn't contained in the submitted job.
2515 "stages": { # A mapping from each stage to the information about that stage.
2516 "a_key": { # Contains information about how a particular
2517 # google.dataflow.v1beta3.Step will be executed.
2518 "stepName": [ # The steps associated with the execution stage.
2519 # Note that stages may have several steps, and that a given step
2520 # might be run by more than one stage.
2521 "A String",
2522 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002523 },
2524 },
2525 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002526 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002527 ],
2528 }</pre>
2529</div>
2530
2531<div class="method">
2532 <code class="details" id="list_next">list_next(previous_request, previous_response)</code>
2533 <pre>Retrieves the next page of results.
2534
2535Args:
2536 previous_request: The request for the previous page. (required)
2537 previous_response: The response from the request for the previous page. (required)
2538
2539Returns:
2540 A request object that you can call 'execute()' on to request the next
2541 page. Returns None if there are no more items in the collection.
2542 </pre>
2543</div>
2544
2545<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002546 <code class="details" id="snapshot">snapshot(projectId, location, jobId, body, x__xgafv=None)</code>
2547 <pre>Snapshot the state of a streaming job.
2548
2549Args:
2550 projectId: string, The project which owns the job to be snapshotted. (required)
2551 location: string, The location that contains this job. (required)
2552 jobId: string, The job to be snapshotted. (required)
2553 body: object, The request body. (required)
2554 The object takes the form of:
2555
2556{ # Request to create a snapshot of a job.
2557 "location": "A String", # The location that contains this job.
2558 "ttl": "A String", # TTL for the snapshot.
2559 }
2560
2561 x__xgafv: string, V1 error format.
2562 Allowed values
2563 1 - v1 error format
2564 2 - v2 error format
2565
2566Returns:
2567 An object of the form:
2568
2569 { # Represents a snapshot of a job.
2570 "sourceJobId": "A String", # The job this snapshot was created from.
2571 "projectId": "A String", # The project this snapshot belongs to.
2572 "creationTime": "A String", # The time this snapshot was created.
2573 "state": "A String", # State of the snapshot.
2574 "ttl": "A String", # The time after which this snapshot will be automatically deleted.
2575 "id": "A String", # The unique ID of this snapshot.
2576 }</pre>
2577</div>
2578
2579<div class="method">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002580 <code class="details" id="update">update(projectId, location, jobId, body, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002581 <pre>Updates the state of an existing Cloud Dataflow job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002582
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002583To update the state of an existing job, we recommend using
2584`projects.locations.jobs.update` with a [regional endpoint]
2585(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
2586`projects.jobs.update` is not recommended, as you can only update the state
2587of jobs that are running in `us-central1`.
2588
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002589Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002590 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002591 location: string, The [regional endpoint]
2592(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2593contains this job. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002594 jobId: string, The job ID. (required)
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002595 body: object, The request body. (required)
2596 The object takes the form of:
2597
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002598{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002599 "labels": { # User-defined labels for this job.
2600 #
2601 # The labels map can contain no more than 64 entries. Entries of the labels
2602 # map are UTF8 strings that comply with the following restrictions:
2603 #
2604 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2605 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
2606 # * Both keys and values are additionally constrained to be <= 128 bytes in
2607 # size.
2608 "a_key": "A String",
2609 },
2610 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2611 # by the metadata values provided here. Populated for ListJobs and all GetJob
2612 # views SUMMARY and higher.
2613 # ListJob response and Job SUMMARY view.
2614 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
2615 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
2616 "version": "A String", # The version of the SDK used to run the job.
2617 "sdkSupportStatus": "A String", # The support status for this SDK version.
2618 },
2619 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
2620 { # Metadata for a PubSub connector used by the job.
2621 "topic": "A String", # Topic accessed in the connection.
2622 "subscription": "A String", # Subscription used in the connection.
2623 },
2624 ],
2625 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2626 { # Metadata for a Datastore connector used by the job.
2627 "projectId": "A String", # ProjectId accessed in the connection.
2628 "namespace": "A String", # Namespace used in the connection.
2629 },
2630 ],
2631 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2632 { # Metadata for a File connector used by the job.
2633 "filePattern": "A String", # File Pattern used to access files by the connector.
2634 },
2635 ],
2636 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2637 { # Metadata for a Spanner connector used by the job.
2638 "instanceId": "A String", # InstanceId accessed in the connection.
2639 "projectId": "A String", # ProjectId accessed in the connection.
2640 "databaseId": "A String", # DatabaseId accessed in the connection.
2641 },
2642 ],
2643 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2644 { # Metadata for a BigTable connector used by the job.
2645 "instanceId": "A String", # InstanceId accessed in the connection.
2646 "projectId": "A String", # ProjectId accessed in the connection.
2647 "tableId": "A String", # TableId accessed in the connection.
2648 },
2649 ],
2650 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2651 { # Metadata for a BigQuery connector used by the job.
2652 "projectId": "A String", # Project accessed in the connection.
2653 "dataset": "A String", # Dataset accessed in the connection.
2654 "table": "A String", # Table accessed in the connection.
2655 "query": "A String", # Query used to access data in the connection.
2656 },
2657 ],
2658 },
2659 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2660 # A description of the user pipeline and stages through which it is executed.
2661 # Created by Cloud Dataflow service. Only retrieved with
2662 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2663 # form. This data is provided by the Dataflow service for ease of visualizing
2664 # the pipeline and interpreting Dataflow provided metrics.
2665 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2666 { # Description of the type, names/ids, and input/outputs for a transform.
2667 "kind": "A String", # Type of transform.
2668 "name": "A String", # User provided name for this transform instance.
2669 "inputCollectionName": [ # User names for all collection inputs to this transform.
2670 "A String",
2671 ],
2672 "displayData": [ # Transform-specific display data.
2673 { # Data provided with a pipeline or transform to provide descriptive info.
2674 "shortStrValue": "A String", # A possible additional shorter value to display.
2675 # For example a java_class_name_value of com.mypackage.MyDoFn
2676 # will be stored with MyDoFn as the short_str_value and
2677 # com.mypackage.MyDoFn as the java_class_name value.
2678 # short_str_value can be displayed and java_class_name_value
2679 # will be displayed as a tooltip.
2680 "durationValue": "A String", # Contains value if the data is of duration type.
2681 "url": "A String", # An optional full URL.
2682 "floatValue": 3.14, # Contains value if the data is of float type.
2683 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2684 # language namespace (i.e. python module) which defines the display data.
2685 # This allows a dax monitoring system to specially handle the data
2686 # and perform custom rendering.
2687 "javaClassValue": "A String", # Contains value if the data is of java class type.
2688 "label": "A String", # An optional label to display in a dax UI for the element.
2689 "boolValue": True or False, # Contains value if the data is of a boolean type.
2690 "strValue": "A String", # Contains value if the data is of string type.
2691 "key": "A String", # The key identifying the display data.
2692 # This is intended to be used as a label for the display data
2693 # when viewed in a dax monitoring system.
2694 "int64Value": "A String", # Contains value if the data is of int64 type.
2695 "timestampValue": "A String", # Contains value if the data is of timestamp type.
2696 },
2697 ],
2698 "outputCollectionName": [ # User names for all collection outputs to this transform.
2699 "A String",
2700 ],
2701 "id": "A String", # SDK generated id of this transform instance.
2702 },
2703 ],
2704 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2705 { # Description of the composing transforms, names/ids, and input/outputs of a
2706 # stage of execution. Some composing transforms and sources may have been
2707 # generated by the Dataflow service during execution planning.
2708 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2709 { # Description of an interstitial value between transforms in an execution
2710 # stage.
2711 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2712 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2713 # source is most closely associated.
2714 "name": "A String", # Dataflow service generated name for this source.
2715 },
2716 ],
2717 "kind": "A String", # Type of tranform this stage is executing.
2718 "name": "A String", # Dataflow service generated name for this stage.
2719 "outputSource": [ # Output sources for this stage.
2720 { # Description of an input or output of an execution stage.
2721 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2722 "sizeBytes": "A String", # Size of the source, if measurable.
2723 "name": "A String", # Dataflow service generated name for this source.
2724 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2725 # source is most closely associated.
2726 },
2727 ],
2728 "inputSource": [ # Input sources for this stage.
2729 { # Description of an input or output of an execution stage.
2730 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2731 "sizeBytes": "A String", # Size of the source, if measurable.
2732 "name": "A String", # Dataflow service generated name for this source.
2733 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2734 # source is most closely associated.
2735 },
2736 ],
2737 "componentTransform": [ # Transforms that comprise this execution stage.
2738 { # Description of a transform executed as part of an execution stage.
2739 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2740 "originalTransform": "A String", # User name for the original user transform with which this transform is
2741 # most closely associated.
2742 "name": "A String", # Dataflow service generated name for this source.
2743 },
2744 ],
2745 "id": "A String", # Dataflow service generated id for this stage.
2746 },
2747 ],
2748 "displayData": [ # Pipeline level display data.
2749 { # Data provided with a pipeline or transform to provide descriptive info.
2750 "shortStrValue": "A String", # A possible additional shorter value to display.
2751 # For example a java_class_name_value of com.mypackage.MyDoFn
2752 # will be stored with MyDoFn as the short_str_value and
2753 # com.mypackage.MyDoFn as the java_class_name value.
2754 # short_str_value can be displayed and java_class_name_value
2755 # will be displayed as a tooltip.
2756 "durationValue": "A String", # Contains value if the data is of duration type.
2757 "url": "A String", # An optional full URL.
2758 "floatValue": 3.14, # Contains value if the data is of float type.
2759 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2760 # language namespace (i.e. python module) which defines the display data.
2761 # This allows a dax monitoring system to specially handle the data
2762 # and perform custom rendering.
2763 "javaClassValue": "A String", # Contains value if the data is of java class type.
2764 "label": "A String", # An optional label to display in a dax UI for the element.
2765 "boolValue": True or False, # Contains value if the data is of a boolean type.
2766 "strValue": "A String", # Contains value if the data is of string type.
2767 "key": "A String", # The key identifying the display data.
2768 # This is intended to be used as a label for the display data
2769 # when viewed in a dax monitoring system.
2770 "int64Value": "A String", # Contains value if the data is of int64 type.
2771 "timestampValue": "A String", # Contains value if the data is of timestamp type.
2772 },
2773 ],
2774 },
2775 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2776 # callers cannot mutate it.
2777 { # A message describing the state of a particular execution stage.
2778 "executionStageName": "A String", # The name of the execution stage.
2779 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2780 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
2781 },
2782 ],
2783 "id": "A String", # The unique ID of this job.
2784 #
2785 # This field is set by the Cloud Dataflow service when the Job is
2786 # created, and is immutable for the life of the job.
2787 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2788 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2789 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2790 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2791 # corresponding name prefixes of the new job.
2792 "a_key": "A String",
2793 },
2794 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
2795 "version": { # A structure describing which components and their versions of the service
2796 # are required in order to run the job.
2797 "a_key": "", # Properties of the object.
2798 },
2799 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2800 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2801 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2802 #
2803 # Format:
2804 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2805 "internalExperiments": { # Experimental settings.
2806 "a_key": "", # Properties of the object. Contains field @type with type URL.
2807 },
2808 "dataset": "A String", # The dataset for the current project where various workflow
2809 # related tables are stored.
2810 #
2811 # The supported resource type is:
2812 #
2813 # Google BigQuery:
2814 # bigquery.googleapis.com/{dataset}
2815 "experiments": [ # The list of experiments to enable.
2816 "A String",
2817 ],
2818 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2819 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2820 # options are passed through the service and are used to recreate the
2821 # SDK pipeline options on the worker in a language agnostic and platform
2822 # independent way.
2823 "a_key": "", # Properties of the object.
2824 },
2825 "userAgent": { # A description of the process that generated the request.
2826 "a_key": "", # Properties of the object.
2827 },
2828 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
2829 # unspecified, the service will attempt to choose a reasonable
2830 # default. This should be in the form of the API service name,
2831 # e.g. "compute.googleapis.com".
2832 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2833 # specified in order for the job to have workers.
2834 { # Describes one particular pool of Cloud Dataflow workers to be
2835 # instantiated by the Cloud Dataflow service in order to perform the
2836 # computations required by a job. Note that a workflow job may use
2837 # multiple pools, in order to match the various computational
2838 # requirements of the various stages of the job.
2839 "diskSourceImage": "A String", # Fully qualified source image for disks.
2840 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2841 # using the standard Dataflow task runner. Users should ignore
2842 # this field.
2843 "workflowFileName": "A String", # The file to store the workflow in.
2844 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2845 # will not be uploaded.
2846 #
2847 # The supported resource type is:
2848 #
2849 # Google Cloud Storage:
2850 # storage.googleapis.com/{bucket}/{object}
2851 # bucket.storage.googleapis.com/{object}
2852 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
2853 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2854 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2855 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2856 # "shuffle/v1beta1".
2857 "workerId": "A String", # The ID of the worker running this pipeline.
2858 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
2859 #
2860 # When workers access Google Cloud APIs, they logically do so via
2861 # relative URLs. If this field is specified, it supplies the base
2862 # URL to use for resolving these relative URLs. The normative
2863 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2864 # Locators".
2865 #
2866 # If not specified, the default value is "http://www.googleapis.com/"
2867 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
2868 # "dataflow/v1b3/projects".
2869 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2870 # storage.
2871 #
2872 # The supported resource type is:
2873 #
2874 # Google Cloud Storage:
2875 #
2876 # storage.googleapis.com/{bucket}/{object}
2877 # bucket.storage.googleapis.com/{object}
2878 },
2879 "vmId": "A String", # The ID string of the VM.
2880 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2881 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2882 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2883 # access the Cloud Dataflow API.
2884 "A String",
2885 ],
2886 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2887 # taskrunner; e.g. "root".
2888 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
2889 #
2890 # When workers access Google Cloud APIs, they logically do so via
2891 # relative URLs. If this field is specified, it supplies the base
2892 # URL to use for resolving these relative URLs. The normative
2893 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2894 # Locators".
2895 #
2896 # If not specified, the default value is "http://www.googleapis.com/"
2897 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2898 # taskrunner; e.g. "wheel".
2899 "languageHint": "A String", # The suggested backend language.
2900 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2901 # console.
2902 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2903 "logDir": "A String", # The directory on the VM to store logs.
2904 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
2905 "harnessCommand": "A String", # The command to launch the worker harness.
2906 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
2907 # temporary storage.
2908 #
2909 # The supported resource type is:
2910 #
2911 # Google Cloud Storage:
2912 # storage.googleapis.com/{bucket}/{object}
2913 # bucket.storage.googleapis.com/{object}
2914 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2915 },
2916 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
2917 # are supported.
2918 "packages": [ # Packages to be installed on workers.
2919 { # The packages that must be installed in order for a worker to run the
2920 # steps of the Cloud Dataflow job that will be assigned to its worker
2921 # pool.
2922 #
2923 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2924 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2925 # might use this to install jars containing the user's code and all of the
2926 # various dependencies (libraries, data files, etc.) required in order
2927 # for that code to run.
2928 "location": "A String", # The resource to read the package from. The supported resource type is:
2929 #
2930 # Google Cloud Storage:
2931 #
2932 # storage.googleapis.com/{bucket}
2933 # bucket.storage.googleapis.com/
2934 "name": "A String", # The name of the package.
2935 },
2936 ],
2937 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2938 # service will attempt to choose a reasonable default.
2939 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2940 # the service will use the network "default".
2941 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2942 # will attempt to choose a reasonable default.
2943 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2944 # attempt to choose a reasonable default.
2945 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
2946 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2947 # `TEARDOWN_NEVER`.
2948 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2949 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2950 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2951 # down.
2952 #
2953 # If the workers are not torn down by the service, they will
2954 # continue to run and use Google Compute Engine VM resources in the
2955 # user's project until they are explicitly terminated by the user.
2956 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2957 # policy except for small, manually supervised test jobs.
2958 #
2959 # If unknown or unspecified, the service will attempt to choose a reasonable
2960 # default.
2961 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2962 # Compute Engine API.
2963 "ipConfiguration": "A String", # Configuration for VM IPs.
2964 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
2965 # service will choose a number of threads (according to the number of cores
2966 # on the selected machine type for batch, or 1 by convention for streaming).
2967 "poolArgs": { # Extra arguments for this worker pool.
2968 "a_key": "", # Properties of the object. Contains field @type with type URL.
2969 },
2970 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
2971 # execute the job. If zero or unspecified, the service will
2972 # attempt to choose a reasonable default.
2973 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2974 # harness, residing in Google Container Registry.
2975 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2976 # the form "regions/REGION/subnetworks/SUBNETWORK".
2977 "dataDisks": [ # Data disks that are used by a VM in this workflow.
2978 { # Describes the data disk used by a workflow job.
2979 "mountPoint": "A String", # Directory in a VM where disk is mounted.
2980 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
2981 # attempt to choose a reasonable default.
2982 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
2983 # must be a disk type appropriate to the project and zone in which
2984 # the workers will run. If unknown or unspecified, the service
2985 # will attempt to choose a reasonable default.
2986 #
2987 # For example, the standard persistent disk type is a resource name
2988 # typically ending in "pd-standard". If SSD persistent disks are
2989 # available, the resource name typically ends with "pd-ssd". The
2990 # actual valid values are defined the Google Compute Engine API,
2991 # not by the Cloud Dataflow API; consult the Google Compute Engine
2992 # documentation for more information about determining the set of
2993 # available disk types for a particular project and zone.
2994 #
2995 # Google Compute Engine Disk types are local to a particular
2996 # project in a particular zone, and so the resource name will
2997 # typically look something like this:
2998 #
2999 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
3000 },
3001 ],
3002 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3003 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
3004 "algorithm": "A String", # The algorithm to use for autoscaling.
3005 },
3006 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3007 # select a default set of packages which are useful to worker
3008 # harnesses written in a particular language.
3009 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
3010 # attempt to choose a reasonable default.
3011 "metadata": { # Metadata to set on the Google Compute Engine VMs.
3012 "a_key": "A String",
3013 },
3014 },
3015 ],
3016 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3017 # storage. The system will append the suffix "/temp-{JOBNAME} to
3018 # this resource prefix, where {JOBNAME} is the value of the
3019 # job_name field. The resulting bucket and object prefix is used
3020 # as the prefix of the resources used to store temporary data
3021 # needed during the job execution. NOTE: This will override the
3022 # value in taskrunner_settings.
3023 # The supported resource type is:
3024 #
3025 # Google Cloud Storage:
3026 #
3027 # storage.googleapis.com/{bucket}/{object}
3028 # bucket.storage.googleapis.com/{object}
3029 },
3030 "location": "A String", # The [regional endpoint]
3031 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3032 # contains this job.
3033 "tempFiles": [ # A set of files the system should be aware of that are used
3034 # for temporary storage. These temporary files will be
3035 # removed on job completion.
3036 # No duplicates are allowed.
3037 # No file patterns are supported.
3038 #
3039 # The supported files are:
3040 #
3041 # Google Cloud Storage:
3042 #
3043 # storage.googleapis.com/{bucket}/{object}
3044 # bucket.storage.googleapis.com/{object}
3045 "A String",
3046 ],
3047 "type": "A String", # The type of Cloud Dataflow job.
3048 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3049 # If this field is set, the service will ensure its uniqueness.
3050 # The request to create a job will fail if the service has knowledge of a
3051 # previously submitted job with the same client's ID and job name.
3052 # The caller may use this field to ensure idempotence of job
3053 # creation across retried attempts to create a job.
3054 # By default, the field is empty and, in that case, the service ignores it.
3055 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3056 # snapshot.
3057 "stepsLocation": "A String", # The GCS location where the steps are stored.
3058 "currentStateTime": "A String", # The timestamp associated with the current state.
3059 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3060 # Flexible resource scheduling jobs are started with some delay after job
3061 # creation, so start_time is unset before start and is updated when the
3062 # job is started by the Cloud Dataflow service. For other jobs, start_time
3063 # always equals to create_time and is immutable and set by the Cloud Dataflow
3064 # service.
3065 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3066 # Cloud Dataflow service.
3067 "requestedState": "A String", # The job's requested state.
3068 #
3069 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3070 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3071 # also be used to directly set a job's requested state to
3072 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3073 # job if it has not already reached a terminal state.
3074 "name": "A String", # The user-specified Cloud Dataflow job name.
3075 #
3076 # Only one Job with a given name may exist in a project at any
3077 # given time. If a caller attempts to create a Job with the same
3078 # name as an already-existing Job, the attempt returns the
3079 # existing Job.
3080 #
3081 # The name must match the regular expression
3082 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3083 "steps": [ # Exactly one of step or steps_location should be specified.
3084 #
3085 # The top-level steps that constitute the entire job.
3086 { # Defines a particular step within a Cloud Dataflow job.
3087 #
3088 # A job consists of multiple steps, each of which performs some
3089 # specific operation as part of the overall job. Data is typically
3090 # passed from one step to another as part of the job.
3091 #
3092 # Here's an example of a sequence of steps which together implement a
3093 # Map-Reduce job:
3094 #
3095 # * Read a collection of data from some source, parsing the
3096 # collection's elements.
3097 #
3098 # * Validate the elements.
3099 #
3100 # * Apply a user-defined function to map each element to some value
3101 # and extract an element-specific key value.
3102 #
3103 # * Group elements with the same key into a single element with
3104 # that key, transforming a multiply-keyed collection into a
3105 # uniquely-keyed collection.
3106 #
3107 # * Write the elements out to some data sink.
3108 #
3109 # Note that the Cloud Dataflow service may be used to run many different
3110 # types of jobs, not just Map-Reduce.
3111 "kind": "A String", # The kind of step in the Cloud Dataflow job.
3112 "properties": { # Named properties associated with the step. Each kind of
3113 # predefined step has its own required set of properties.
3114 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3115 "a_key": "", # Properties of the object.
3116 },
3117 "name": "A String", # The name that identifies the step. This must be unique for each
3118 # step with respect to all other steps in the Cloud Dataflow job.
3119 },
3120 ],
3121 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3122 # of the job it replaced.
3123 #
3124 # When sending a `CreateJobRequest`, you can update a job by specifying it
3125 # here. The job named here is stopped, and its intermediate state is
3126 # transferred to this job.
3127 "currentState": "A String", # The current state of the job.
3128 #
3129 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3130 # specified.
3131 #
3132 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3133 # terminal state. After a job has reached a terminal state, no
3134 # further state updates may be made.
3135 #
3136 # This field may be mutated by the Cloud Dataflow service;
3137 # callers cannot mutate it.
3138 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3139 # isn't contained in the submitted job.
3140 "stages": { # A mapping from each stage to the information about that stage.
3141 "a_key": { # Contains information about how a particular
3142 # google.dataflow.v1beta3.Step will be executed.
3143 "stepName": [ # The steps associated with the execution stage.
3144 # Note that stages may have several steps, and that a given step
3145 # might be run by more than one stage.
3146 "A String",
3147 ],
3148 },
3149 },
3150 },
3151}
3152
3153 x__xgafv: string, V1 error format.
3154 Allowed values
3155 1 - v1 error format
3156 2 - v2 error format
3157
3158Returns:
3159 An object of the form:
3160
3161 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003162 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003163 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003164 # The labels map can contain no more than 64 entries. Entries of the labels
3165 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003166 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003167 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3168 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
3169 # * Both keys and values are additionally constrained to be <= 128 bytes in
3170 # size.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003171 "a_key": "A String",
3172 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003173 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3174 # by the metadata values provided here. Populated for ListJobs and all GetJob
3175 # views SUMMARY and higher.
3176 # ListJob response and Job SUMMARY view.
3177 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
3178 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
3179 "version": "A String", # The version of the SDK used to run the job.
3180 "sdkSupportStatus": "A String", # The support status for this SDK version.
3181 },
3182 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
3183 { # Metadata for a PubSub connector used by the job.
3184 "topic": "A String", # Topic accessed in the connection.
3185 "subscription": "A String", # Subscription used in the connection.
3186 },
3187 ],
3188 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
3189 { # Metadata for a Datastore connector used by the job.
3190 "projectId": "A String", # ProjectId accessed in the connection.
3191 "namespace": "A String", # Namespace used in the connection.
3192 },
3193 ],
3194 "fileDetails": [ # Identification of a File source used in the Dataflow job.
3195 { # Metadata for a File connector used by the job.
3196 "filePattern": "A String", # File Pattern used to access files by the connector.
3197 },
3198 ],
3199 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
3200 { # Metadata for a Spanner connector used by the job.
3201 "instanceId": "A String", # InstanceId accessed in the connection.
3202 "projectId": "A String", # ProjectId accessed in the connection.
3203 "databaseId": "A String", # DatabaseId accessed in the connection.
3204 },
3205 ],
3206 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
3207 { # Metadata for a BigTable connector used by the job.
3208 "instanceId": "A String", # InstanceId accessed in the connection.
3209 "projectId": "A String", # ProjectId accessed in the connection.
3210 "tableId": "A String", # TableId accessed in the connection.
3211 },
3212 ],
3213 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
3214 { # Metadata for a BigQuery connector used by the job.
3215 "projectId": "A String", # Project accessed in the connection.
3216 "dataset": "A String", # Dataset accessed in the connection.
3217 "table": "A String", # Table accessed in the connection.
3218 "query": "A String", # Query used to access data in the connection.
3219 },
3220 ],
3221 },
3222 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3223 # A description of the user pipeline and stages through which it is executed.
3224 # Created by Cloud Dataflow service. Only retrieved with
3225 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3226 # form. This data is provided by the Dataflow service for ease of visualizing
3227 # the pipeline and interpreting Dataflow provided metrics.
3228 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
3229 { # Description of the type, names/ids, and input/outputs for a transform.
3230 "kind": "A String", # Type of transform.
3231 "name": "A String", # User provided name for this transform instance.
3232 "inputCollectionName": [ # User names for all collection inputs to this transform.
3233 "A String",
3234 ],
3235 "displayData": [ # Transform-specific display data.
3236 { # Data provided with a pipeline or transform to provide descriptive info.
3237 "shortStrValue": "A String", # A possible additional shorter value to display.
3238 # For example a java_class_name_value of com.mypackage.MyDoFn
3239 # will be stored with MyDoFn as the short_str_value and
3240 # com.mypackage.MyDoFn as the java_class_name value.
3241 # short_str_value can be displayed and java_class_name_value
3242 # will be displayed as a tooltip.
3243 "durationValue": "A String", # Contains value if the data is of duration type.
3244 "url": "A String", # An optional full URL.
3245 "floatValue": 3.14, # Contains value if the data is of float type.
3246 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3247 # language namespace (i.e. python module) which defines the display data.
3248 # This allows a dax monitoring system to specially handle the data
3249 # and perform custom rendering.
3250 "javaClassValue": "A String", # Contains value if the data is of java class type.
3251 "label": "A String", # An optional label to display in a dax UI for the element.
3252 "boolValue": True or False, # Contains value if the data is of a boolean type.
3253 "strValue": "A String", # Contains value if the data is of string type.
3254 "key": "A String", # The key identifying the display data.
3255 # This is intended to be used as a label for the display data
3256 # when viewed in a dax monitoring system.
3257 "int64Value": "A String", # Contains value if the data is of int64 type.
3258 "timestampValue": "A String", # Contains value if the data is of timestamp type.
3259 },
3260 ],
3261 "outputCollectionName": [ # User names for all collection outputs to this transform.
3262 "A String",
3263 ],
3264 "id": "A String", # SDK generated id of this transform instance.
3265 },
3266 ],
3267 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
3268 { # Description of the composing transforms, names/ids, and input/outputs of a
3269 # stage of execution. Some composing transforms and sources may have been
3270 # generated by the Dataflow service during execution planning.
3271 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
3272 { # Description of an interstitial value between transforms in an execution
3273 # stage.
3274 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3275 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3276 # source is most closely associated.
3277 "name": "A String", # Dataflow service generated name for this source.
3278 },
3279 ],
3280 "kind": "A String", # Type of tranform this stage is executing.
3281 "name": "A String", # Dataflow service generated name for this stage.
3282 "outputSource": [ # Output sources for this stage.
3283 { # Description of an input or output of an execution stage.
3284 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3285 "sizeBytes": "A String", # Size of the source, if measurable.
3286 "name": "A String", # Dataflow service generated name for this source.
3287 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3288 # source is most closely associated.
3289 },
3290 ],
3291 "inputSource": [ # Input sources for this stage.
3292 { # Description of an input or output of an execution stage.
3293 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3294 "sizeBytes": "A String", # Size of the source, if measurable.
3295 "name": "A String", # Dataflow service generated name for this source.
3296 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3297 # source is most closely associated.
3298 },
3299 ],
3300 "componentTransform": [ # Transforms that comprise this execution stage.
3301 { # Description of a transform executed as part of an execution stage.
3302 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3303 "originalTransform": "A String", # User name for the original user transform with which this transform is
3304 # most closely associated.
3305 "name": "A String", # Dataflow service generated name for this source.
3306 },
3307 ],
3308 "id": "A String", # Dataflow service generated id for this stage.
3309 },
3310 ],
3311 "displayData": [ # Pipeline level display data.
3312 { # Data provided with a pipeline or transform to provide descriptive info.
3313 "shortStrValue": "A String", # A possible additional shorter value to display.
3314 # For example a java_class_name_value of com.mypackage.MyDoFn
3315 # will be stored with MyDoFn as the short_str_value and
3316 # com.mypackage.MyDoFn as the java_class_name value.
3317 # short_str_value can be displayed and java_class_name_value
3318 # will be displayed as a tooltip.
3319 "durationValue": "A String", # Contains value if the data is of duration type.
3320 "url": "A String", # An optional full URL.
3321 "floatValue": 3.14, # Contains value if the data is of float type.
3322 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3323 # language namespace (i.e. python module) which defines the display data.
3324 # This allows a dax monitoring system to specially handle the data
3325 # and perform custom rendering.
3326 "javaClassValue": "A String", # Contains value if the data is of java class type.
3327 "label": "A String", # An optional label to display in a dax UI for the element.
3328 "boolValue": True or False, # Contains value if the data is of a boolean type.
3329 "strValue": "A String", # Contains value if the data is of string type.
3330 "key": "A String", # The key identifying the display data.
3331 # This is intended to be used as a label for the display data
3332 # when viewed in a dax monitoring system.
3333 "int64Value": "A String", # Contains value if the data is of int64 type.
3334 "timestampValue": "A String", # Contains value if the data is of timestamp type.
3335 },
3336 ],
3337 },
3338 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
3339 # callers cannot mutate it.
3340 { # A message describing the state of a particular execution stage.
3341 "executionStageName": "A String", # The name of the execution stage.
3342 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
3343 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
3344 },
3345 ],
3346 "id": "A String", # The unique ID of this job.
3347 #
3348 # This field is set by the Cloud Dataflow service when the Job is
3349 # created, and is immutable for the life of the job.
3350 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
3351 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3352 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003353 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
3354 # corresponding name prefixes of the new job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003355 "a_key": "A String",
3356 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003357 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
3358 "version": { # A structure describing which components and their versions of the service
3359 # are required in order to run the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003360 "a_key": "", # Properties of the object.
3361 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003362 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
3363 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
3364 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003365 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003366 # Format:
3367 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003368 "internalExperiments": { # Experimental settings.
3369 "a_key": "", # Properties of the object. Contains field @type with type URL.
3370 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003371 "dataset": "A String", # The dataset for the current project where various workflow
3372 # related tables are stored.
3373 #
3374 # The supported resource type is:
3375 #
3376 # Google BigQuery:
3377 # bigquery.googleapis.com/{dataset}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003378 "experiments": [ # The list of experiments to enable.
3379 "A String",
3380 ],
3381 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003382 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
3383 # options are passed through the service and are used to recreate the
3384 # SDK pipeline options on the worker in a language agnostic and platform
3385 # independent way.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003386 "a_key": "", # Properties of the object.
3387 },
3388 "userAgent": { # A description of the process that generated the request.
3389 "a_key": "", # Properties of the object.
3390 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003391 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
3392 # unspecified, the service will attempt to choose a reasonable
3393 # default. This should be in the form of the API service name,
3394 # e.g. "compute.googleapis.com".
3395 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
3396 # specified in order for the job to have workers.
3397 { # Describes one particular pool of Cloud Dataflow workers to be
3398 # instantiated by the Cloud Dataflow service in order to perform the
3399 # computations required by a job. Note that a workflow job may use
3400 # multiple pools, in order to match the various computational
3401 # requirements of the various stages of the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003402 "diskSourceImage": "A String", # Fully qualified source image for disks.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003403 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
3404 # using the standard Dataflow task runner. Users should ignore
3405 # this field.
3406 "workflowFileName": "A String", # The file to store the workflow in.
3407 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
3408 # will not be uploaded.
3409 #
3410 # The supported resource type is:
3411 #
3412 # Google Cloud Storage:
3413 # storage.googleapis.com/{bucket}/{object}
3414 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003415 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003416 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
3417 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
3418 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
3419 # "shuffle/v1beta1".
3420 "workerId": "A String", # The ID of the worker running this pipeline.
3421 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
3422 #
3423 # When workers access Google Cloud APIs, they logically do so via
3424 # relative URLs. If this field is specified, it supplies the base
3425 # URL to use for resolving these relative URLs. The normative
3426 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3427 # Locators".
3428 #
3429 # If not specified, the default value is "http://www.googleapis.com/"
3430 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
3431 # "dataflow/v1b3/projects".
3432 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3433 # storage.
3434 #
3435 # The supported resource type is:
3436 #
3437 # Google Cloud Storage:
3438 #
3439 # storage.googleapis.com/{bucket}/{object}
3440 # bucket.storage.googleapis.com/{object}
3441 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003442 "vmId": "A String", # The ID string of the VM.
3443 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
3444 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
3445 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
3446 # access the Cloud Dataflow API.
3447 "A String",
3448 ],
3449 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
3450 # taskrunner; e.g. "root".
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003451 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3452 #
3453 # When workers access Google Cloud APIs, they logically do so via
3454 # relative URLs. If this field is specified, it supplies the base
3455 # URL to use for resolving these relative URLs. The normative
3456 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3457 # Locators".
3458 #
3459 # If not specified, the default value is "http://www.googleapis.com/"
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003460 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
3461 # taskrunner; e.g. "wheel".
3462 "languageHint": "A String", # The suggested backend language.
3463 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3464 # console.
3465 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
3466 "logDir": "A String", # The directory on the VM to store logs.
3467 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003468 "harnessCommand": "A String", # The command to launch the worker harness.
3469 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
3470 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003471 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003472 # The supported resource type is:
3473 #
3474 # Google Cloud Storage:
3475 # storage.googleapis.com/{bucket}/{object}
3476 # bucket.storage.googleapis.com/{object}
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003477 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003478 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003479 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
3480 # are supported.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003481 "packages": [ # Packages to be installed on workers.
3482 { # The packages that must be installed in order for a worker to run the
3483 # steps of the Cloud Dataflow job that will be assigned to its worker
3484 # pool.
3485 #
3486 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3487 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3488 # might use this to install jars containing the user's code and all of the
3489 # various dependencies (libraries, data files, etc.) required in order
3490 # for that code to run.
3491 "location": "A String", # The resource to read the package from. The supported resource type is:
3492 #
3493 # Google Cloud Storage:
3494 #
3495 # storage.googleapis.com/{bucket}
3496 # bucket.storage.googleapis.com/
3497 "name": "A String", # The name of the package.
3498 },
3499 ],
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003500 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
3501 # service will attempt to choose a reasonable default.
3502 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
3503 # the service will use the network "default".
3504 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
3505 # will attempt to choose a reasonable default.
3506 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3507 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003508 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
3509 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3510 # `TEARDOWN_NEVER`.
3511 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3512 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3513 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3514 # down.
3515 #
3516 # If the workers are not torn down by the service, they will
3517 # continue to run and use Google Compute Engine VM resources in the
3518 # user's project until they are explicitly terminated by the user.
3519 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3520 # policy except for small, manually supervised test jobs.
3521 #
3522 # If unknown or unspecified, the service will attempt to choose a reasonable
3523 # default.
3524 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
3525 # Compute Engine API.
3526 "ipConfiguration": "A String", # Configuration for VM IPs.
3527 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
3528 # service will choose a number of threads (according to the number of cores
3529 # on the selected machine type for batch, or 1 by convention for streaming).
3530 "poolArgs": { # Extra arguments for this worker pool.
3531 "a_key": "", # Properties of the object. Contains field @type with type URL.
3532 },
3533 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
3534 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003535 # attempt to choose a reasonable default.
3536 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
3537 # harness, residing in Google Container Registry.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003538 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3539 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003540 "dataDisks": [ # Data disks that are used by a VM in this workflow.
3541 { # Describes the data disk used by a workflow job.
3542 "mountPoint": "A String", # Directory in a VM where disk is mounted.
3543 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
3544 # attempt to choose a reasonable default.
3545 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
3546 # must be a disk type appropriate to the project and zone in which
3547 # the workers will run. If unknown or unspecified, the service
3548 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003549 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003550 # For example, the standard persistent disk type is a resource name
3551 # typically ending in "pd-standard". If SSD persistent disks are
3552 # available, the resource name typically ends with "pd-ssd". The
3553 # actual valid values are defined the Google Compute Engine API,
3554 # not by the Cloud Dataflow API; consult the Google Compute Engine
3555 # documentation for more information about determining the set of
3556 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003557 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003558 # Google Compute Engine Disk types are local to a particular
3559 # project in a particular zone, and so the resource name will
3560 # typically look something like this:
3561 #
3562 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003563 },
3564 ],
3565 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3566 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
3567 "algorithm": "A String", # The algorithm to use for autoscaling.
3568 },
3569 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3570 # select a default set of packages which are useful to worker
3571 # harnesses written in a particular language.
3572 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
3573 # attempt to choose a reasonable default.
3574 "metadata": { # Metadata to set on the Google Compute Engine VMs.
3575 "a_key": "A String",
3576 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003577 },
3578 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003579 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3580 # storage. The system will append the suffix "/temp-{JOBNAME} to
3581 # this resource prefix, where {JOBNAME} is the value of the
3582 # job_name field. The resulting bucket and object prefix is used
3583 # as the prefix of the resources used to store temporary data
3584 # needed during the job execution. NOTE: This will override the
3585 # value in taskrunner_settings.
3586 # The supported resource type is:
3587 #
3588 # Google Cloud Storage:
3589 #
3590 # storage.googleapis.com/{bucket}/{object}
3591 # bucket.storage.googleapis.com/{object}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003592 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003593 "location": "A String", # The [regional endpoint]
3594 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3595 # contains this job.
3596 "tempFiles": [ # A set of files the system should be aware of that are used
3597 # for temporary storage. These temporary files will be
3598 # removed on job completion.
3599 # No duplicates are allowed.
3600 # No file patterns are supported.
3601 #
3602 # The supported files are:
3603 #
3604 # Google Cloud Storage:
3605 #
3606 # storage.googleapis.com/{bucket}/{object}
3607 # bucket.storage.googleapis.com/{object}
3608 "A String",
3609 ],
3610 "type": "A String", # The type of Cloud Dataflow job.
3611 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3612 # If this field is set, the service will ensure its uniqueness.
3613 # The request to create a job will fail if the service has knowledge of a
3614 # previously submitted job with the same client's ID and job name.
3615 # The caller may use this field to ensure idempotence of job
3616 # creation across retried attempts to create a job.
3617 # By default, the field is empty and, in that case, the service ignores it.
3618 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3619 # snapshot.
3620 "stepsLocation": "A String", # The GCS location where the steps are stored.
3621 "currentStateTime": "A String", # The timestamp associated with the current state.
3622 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3623 # Flexible resource scheduling jobs are started with some delay after job
3624 # creation, so start_time is unset before start and is updated when the
3625 # job is started by the Cloud Dataflow service. For other jobs, start_time
3626 # always equals to create_time and is immutable and set by the Cloud Dataflow
3627 # service.
3628 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3629 # Cloud Dataflow service.
3630 "requestedState": "A String", # The job's requested state.
3631 #
3632 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3633 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3634 # also be used to directly set a job's requested state to
3635 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3636 # job if it has not already reached a terminal state.
3637 "name": "A String", # The user-specified Cloud Dataflow job name.
3638 #
3639 # Only one Job with a given name may exist in a project at any
3640 # given time. If a caller attempts to create a Job with the same
3641 # name as an already-existing Job, the attempt returns the
3642 # existing Job.
3643 #
3644 # The name must match the regular expression
3645 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3646 "steps": [ # Exactly one of step or steps_location should be specified.
3647 #
3648 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003649 { # Defines a particular step within a Cloud Dataflow job.
3650 #
3651 # A job consists of multiple steps, each of which performs some
3652 # specific operation as part of the overall job. Data is typically
3653 # passed from one step to another as part of the job.
3654 #
3655 # Here's an example of a sequence of steps which together implement a
3656 # Map-Reduce job:
3657 #
3658 # * Read a collection of data from some source, parsing the
3659 # collection's elements.
3660 #
3661 # * Validate the elements.
3662 #
3663 # * Apply a user-defined function to map each element to some value
3664 # and extract an element-specific key value.
3665 #
3666 # * Group elements with the same key into a single element with
3667 # that key, transforming a multiply-keyed collection into a
3668 # uniquely-keyed collection.
3669 #
3670 # * Write the elements out to some data sink.
3671 #
3672 # Note that the Cloud Dataflow service may be used to run many different
3673 # types of jobs, not just Map-Reduce.
3674 "kind": "A String", # The kind of step in the Cloud Dataflow job.
3675 "properties": { # Named properties associated with the step. Each kind of
3676 # predefined step has its own required set of properties.
3677 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003678 "a_key": "", # Properties of the object.
3679 },
Thomas Coffee2f245372017-03-27 10:39:26 -07003680 "name": "A String", # The name that identifies the step. This must be unique for each
3681 # step with respect to all other steps in the Cloud Dataflow job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003682 },
3683 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07003684 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3685 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003686 #
Thomas Coffee2f245372017-03-27 10:39:26 -07003687 # When sending a `CreateJobRequest`, you can update a job by specifying it
3688 # here. The job named here is stopped, and its intermediate state is
3689 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003690 "currentState": "A String", # The current state of the job.
3691 #
3692 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3693 # specified.
3694 #
3695 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3696 # terminal state. After a job has reached a terminal state, no
3697 # further state updates may be made.
3698 #
3699 # This field may be mutated by the Cloud Dataflow service;
3700 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003701 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3702 # isn't contained in the submitted job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003703 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003704 "a_key": { # Contains information about how a particular
3705 # google.dataflow.v1beta3.Step will be executed.
3706 "stepName": [ # The steps associated with the execution stage.
3707 # Note that stages may have several steps, and that a given step
3708 # might be run by more than one stage.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003709 "A String",
3710 ],
3711 },
3712 },
3713 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003714 }</pre>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003715</div>
3716
3717</body></html>