blob: 1ddadd8fb2cb78a4ab74ce95d2c8de0460737df1 [file] [log] [blame]
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001<html><body>
2<style>
3
4body, h1, h2, h3, div, span, p, pre, a {
5 margin: 0;
6 padding: 0;
7 border: 0;
8 font-weight: inherit;
9 font-style: inherit;
10 font-size: 100%;
11 font-family: inherit;
12 vertical-align: baseline;
13}
14
15body {
16 font-size: 13px;
17 padding: 1em;
18}
19
20h1 {
21 font-size: 26px;
22 margin-bottom: 1em;
23}
24
25h2 {
26 font-size: 24px;
27 margin-bottom: 1em;
28}
29
30h3 {
31 font-size: 20px;
32 margin-bottom: 1em;
33 margin-top: 1em;
34}
35
36pre, code {
37 line-height: 1.5;
38 font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
39}
40
41pre {
42 margin-top: 0.5em;
43}
44
45h1, h2, h3, p {
46 font-family: Arial, sans serif;
47}
48
49h1, h2, h3 {
50 border-bottom: solid #CCC 1px;
51}
52
53.toc_element {
54 margin-top: 0.5em;
55}
56
57.firstline {
58 margin-left: 2 em;
59}
60
61.method {
62 margin-top: 1em;
63 border: solid 1px #CCC;
64 padding: 1em;
65 background: #EEE;
66}
67
68.details {
69 font-weight: bold;
70 font-size: 14px;
71}
72
73</style>
74
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070075<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.locations.html">locations</a> . <a href="dataflow_v1b3.projects.locations.jobs.html">jobs</a></h1>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080076<h2>Instance Methods</h2>
77<p class="toc_element">
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -040078 <code><a href="dataflow_v1b3.projects.locations.jobs.debug.html">debug()</a></code>
79</p>
80<p class="firstline">Returns the debug Resource.</p>
81
82<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080083 <code><a href="dataflow_v1b3.projects.locations.jobs.messages.html">messages()</a></code>
84</p>
85<p class="firstline">Returns the messages Resource.</p>
86
87<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -070088 <code><a href="dataflow_v1b3.projects.locations.jobs.snapshots.html">snapshots()</a></code>
89</p>
90<p class="firstline">Returns the snapshots Resource.</p>
91
92<p class="toc_element">
Jon Wayne Parrott692617a2017-01-06 09:58:29 -080093 <code><a href="dataflow_v1b3.projects.locations.jobs.workItems.html">workItems()</a></code>
94</p>
95<p class="firstline">Returns the workItems Resource.</p>
96
97<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -070098 <code><a href="#create">create(projectId, location, body=None, x__xgafv=None, replaceJobId=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -040099<p class="firstline">Creates a Cloud Dataflow job.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800100<p class="toc_element">
101 <code><a href="#get">get(projectId, location, jobId, x__xgafv=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400102<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800103<p class="toc_element">
104 <code><a href="#getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</a></code></p>
105<p class="firstline">Request the job status.</p>
106<p class="toc_element">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700107 <code><a href="#list">list(projectId, location, pageSize=None, pageToken=None, x__xgafv=None, filter=None, view=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400108<p class="firstline">List the jobs of a project.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800109<p class="toc_element">
110 <code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
111<p class="firstline">Retrieves the next page of results.</p>
112<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700113 <code><a href="#snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700114<p class="firstline">Snapshot the state of a streaming job.</p>
115<p class="toc_element">
Dan O'Mearadd494642020-05-01 07:42:23 -0700116 <code><a href="#update">update(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400117<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800118<h3>Method Details</h3>
119<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -0700120 <code class="details" id="create">create(projectId, location, body=None, x__xgafv=None, replaceJobId=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400121 <pre>Creates a Cloud Dataflow job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800122
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700123To create a job, we recommend using `projects.locations.jobs.create` with a
124[regional endpoint]
125(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
126`projects.jobs.create` is not recommended, as your job will always start
127in `us-central1`.
128
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800129Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400130 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700131 location: string, The [regional endpoint]
132(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
133contains this job. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -0700134 body: object, The request body.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800135 The object takes the form of:
136
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400137{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700138 "labels": { # User-defined labels for this job.
139 #
140 # The labels map can contain no more than 64 entries. Entries of the labels
141 # map are UTF8 strings that comply with the following restrictions:
142 #
143 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
144 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -0700145 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700146 # size.
147 "a_key": "A String",
148 },
149 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
150 # by the metadata values provided here. Populated for ListJobs and all GetJob
151 # views SUMMARY and higher.
152 # ListJob response and Job SUMMARY view.
153 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
154 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
155 "version": "A String", # The version of the SDK used to run the job.
156 "sdkSupportStatus": "A String", # The support status for this SDK version.
157 },
158 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
159 { # Metadata for a PubSub connector used by the job.
160 "topic": "A String", # Topic accessed in the connection.
161 "subscription": "A String", # Subscription used in the connection.
162 },
163 ],
164 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
165 { # Metadata for a Datastore connector used by the job.
166 "projectId": "A String", # ProjectId accessed in the connection.
167 "namespace": "A String", # Namespace used in the connection.
168 },
169 ],
170 "fileDetails": [ # Identification of a File source used in the Dataflow job.
171 { # Metadata for a File connector used by the job.
172 "filePattern": "A String", # File Pattern used to access files by the connector.
173 },
174 ],
175 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
176 { # Metadata for a Spanner connector used by the job.
177 "instanceId": "A String", # InstanceId accessed in the connection.
178 "projectId": "A String", # ProjectId accessed in the connection.
179 "databaseId": "A String", # DatabaseId accessed in the connection.
180 },
181 ],
182 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
183 { # Metadata for a BigTable connector used by the job.
184 "instanceId": "A String", # InstanceId accessed in the connection.
185 "projectId": "A String", # ProjectId accessed in the connection.
186 "tableId": "A String", # TableId accessed in the connection.
187 },
188 ],
189 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
190 { # Metadata for a BigQuery connector used by the job.
191 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700192 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -0700193 "table": "A String", # Table accessed in the connection.
194 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700195 },
196 ],
197 },
198 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
199 # A description of the user pipeline and stages through which it is executed.
200 # Created by Cloud Dataflow service. Only retrieved with
201 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
202 # form. This data is provided by the Dataflow service for ease of visualizing
203 # the pipeline and interpreting Dataflow provided metrics.
204 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
205 { # Description of the type, names/ids, and input/outputs for a transform.
206 "kind": "A String", # Type of transform.
207 "name": "A String", # User provided name for this transform instance.
208 "inputCollectionName": [ # User names for all collection inputs to this transform.
209 "A String",
210 ],
211 "displayData": [ # Transform-specific display data.
212 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700213 "key": "A String", # The key identifying the display data.
214 # This is intended to be used as a label for the display data
215 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700216 "shortStrValue": "A String", # A possible additional shorter value to display.
217 # For example a java_class_name_value of com.mypackage.MyDoFn
218 # will be stored with MyDoFn as the short_str_value and
219 # com.mypackage.MyDoFn as the java_class_name value.
220 # short_str_value can be displayed and java_class_name_value
221 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700222 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700223 "url": "A String", # An optional full URL.
224 "floatValue": 3.14, # Contains value if the data is of float type.
225 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
226 # language namespace (i.e. python module) which defines the display data.
227 # This allows a dax monitoring system to specially handle the data
228 # and perform custom rendering.
229 "javaClassValue": "A String", # Contains value if the data is of java class type.
230 "label": "A String", # An optional label to display in a dax UI for the element.
231 "boolValue": True or False, # Contains value if the data is of a boolean type.
232 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700233 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700234 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700235 },
236 ],
237 "outputCollectionName": [ # User names for all collection outputs to this transform.
238 "A String",
239 ],
240 "id": "A String", # SDK generated id of this transform instance.
241 },
242 ],
243 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
244 { # Description of the composing transforms, names/ids, and input/outputs of a
245 # stage of execution. Some composing transforms and sources may have been
246 # generated by the Dataflow service during execution planning.
247 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
248 { # Description of an interstitial value between transforms in an execution
249 # stage.
250 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
251 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
252 # source is most closely associated.
253 "name": "A String", # Dataflow service generated name for this source.
254 },
255 ],
256 "kind": "A String", # Type of tranform this stage is executing.
257 "name": "A String", # Dataflow service generated name for this stage.
258 "outputSource": [ # Output sources for this stage.
259 { # Description of an input or output of an execution stage.
260 "userName": "A String", # Human-readable name for this source; may be user or system generated.
261 "sizeBytes": "A String", # Size of the source, if measurable.
262 "name": "A String", # Dataflow service generated name for this source.
263 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
264 # source is most closely associated.
265 },
266 ],
267 "inputSource": [ # Input sources for this stage.
268 { # Description of an input or output of an execution stage.
269 "userName": "A String", # Human-readable name for this source; may be user or system generated.
270 "sizeBytes": "A String", # Size of the source, if measurable.
271 "name": "A String", # Dataflow service generated name for this source.
272 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
273 # source is most closely associated.
274 },
275 ],
276 "componentTransform": [ # Transforms that comprise this execution stage.
277 { # Description of a transform executed as part of an execution stage.
278 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
279 "originalTransform": "A String", # User name for the original user transform with which this transform is
280 # most closely associated.
281 "name": "A String", # Dataflow service generated name for this source.
282 },
283 ],
284 "id": "A String", # Dataflow service generated id for this stage.
285 },
286 ],
287 "displayData": [ # Pipeline level display data.
288 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700289 "key": "A String", # The key identifying the display data.
290 # This is intended to be used as a label for the display data
291 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700292 "shortStrValue": "A String", # A possible additional shorter value to display.
293 # For example a java_class_name_value of com.mypackage.MyDoFn
294 # will be stored with MyDoFn as the short_str_value and
295 # com.mypackage.MyDoFn as the java_class_name value.
296 # short_str_value can be displayed and java_class_name_value
297 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700298 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700299 "url": "A String", # An optional full URL.
300 "floatValue": 3.14, # Contains value if the data is of float type.
301 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
302 # language namespace (i.e. python module) which defines the display data.
303 # This allows a dax monitoring system to specially handle the data
304 # and perform custom rendering.
305 "javaClassValue": "A String", # Contains value if the data is of java class type.
306 "label": "A String", # An optional label to display in a dax UI for the element.
307 "boolValue": True or False, # Contains value if the data is of a boolean type.
308 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700309 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700310 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700311 },
312 ],
313 },
314 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
315 # callers cannot mutate it.
316 { # A message describing the state of a particular execution stage.
317 "executionStageName": "A String", # The name of the execution stage.
318 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
319 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
320 },
321 ],
322 "id": "A String", # The unique ID of this job.
323 #
324 # This field is set by the Cloud Dataflow service when the Job is
325 # created, and is immutable for the life of the job.
326 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
327 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
328 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
329 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
330 # corresponding name prefixes of the new job.
331 "a_key": "A String",
332 },
333 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700334 "workerRegion": "A String", # The Compute Engine region
335 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
336 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
337 # with worker_zone. If neither worker_region nor worker_zone is specified,
338 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700339 "version": { # A structure describing which components and their versions of the service
340 # are required in order to run the job.
341 "a_key": "", # Properties of the object.
342 },
343 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
344 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
345 # at rest, AKA a Customer Managed Encryption Key (CMEK).
346 #
347 # Format:
348 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
349 "internalExperiments": { # Experimental settings.
350 "a_key": "", # Properties of the object. Contains field @type with type URL.
351 },
352 "dataset": "A String", # The dataset for the current project where various workflow
353 # related tables are stored.
354 #
355 # The supported resource type is:
356 #
357 # Google BigQuery:
358 # bigquery.googleapis.com/{dataset}
359 "experiments": [ # The list of experiments to enable.
360 "A String",
361 ],
362 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
363 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
364 # options are passed through the service and are used to recreate the
365 # SDK pipeline options on the worker in a language agnostic and platform
366 # independent way.
367 "a_key": "", # Properties of the object.
368 },
369 "userAgent": { # A description of the process that generated the request.
370 "a_key": "", # Properties of the object.
371 },
Dan O'Mearadd494642020-05-01 07:42:23 -0700372 "workerZone": "A String", # The Compute Engine zone
373 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
374 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
375 # with worker_region. If neither worker_region nor worker_zone is specified,
376 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700377 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
378 # specified in order for the job to have workers.
379 { # Describes one particular pool of Cloud Dataflow workers to be
380 # instantiated by the Cloud Dataflow service in order to perform the
381 # computations required by a job. Note that a workflow job may use
382 # multiple pools, in order to match the various computational
383 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700384 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
385 # harness, residing in Google Container Registry.
386 #
387 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
388 "ipConfiguration": "A String", # Configuration for VM IPs.
389 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
390 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
391 "algorithm": "A String", # The algorithm to use for autoscaling.
392 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700393 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -0700394 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
395 # the service will use the network "default".
396 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
397 # will attempt to choose a reasonable default.
398 "metadata": { # Metadata to set on the Google Compute Engine VMs.
399 "a_key": "A String",
400 },
401 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
402 # service will attempt to choose a reasonable default.
403 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
404 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700405 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
406 # using the standard Dataflow task runner. Users should ignore
407 # this field.
408 "workflowFileName": "A String", # The file to store the workflow in.
409 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
410 # will not be uploaded.
411 #
412 # The supported resource type is:
413 #
414 # Google Cloud Storage:
415 # storage.googleapis.com/{bucket}/{object}
416 # bucket.storage.googleapis.com/{object}
417 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -0700418 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
419 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
420 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
421 "vmId": "A String", # The ID string of the VM.
422 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
423 # taskrunner; e.g. "wheel".
424 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
425 # taskrunner; e.g. "root".
426 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
427 # access the Cloud Dataflow API.
428 "A String",
429 ],
430 "languageHint": "A String", # The suggested backend language.
431 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
432 # console.
433 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
434 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700435 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
436 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
437 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
438 # "shuffle/v1beta1".
439 "workerId": "A String", # The ID of the worker running this pipeline.
440 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
441 #
442 # When workers access Google Cloud APIs, they logically do so via
443 # relative URLs. If this field is specified, it supplies the base
444 # URL to use for resolving these relative URLs. The normative
445 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
446 # Locators".
447 #
448 # If not specified, the default value is "http://www.googleapis.com/"
449 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
450 # "dataflow/v1b3/projects".
451 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
452 # storage.
453 #
454 # The supported resource type is:
455 #
456 # Google Cloud Storage:
457 #
458 # storage.googleapis.com/{bucket}/{object}
459 # bucket.storage.googleapis.com/{object}
460 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700461 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
462 "harnessCommand": "A String", # The command to launch the worker harness.
463 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
464 # temporary storage.
465 #
466 # The supported resource type is:
467 #
468 # Google Cloud Storage:
469 # storage.googleapis.com/{bucket}/{object}
470 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -0700471 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
472 #
473 # When workers access Google Cloud APIs, they logically do so via
474 # relative URLs. If this field is specified, it supplies the base
475 # URL to use for resolving these relative URLs. The normative
476 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
477 # Locators".
478 #
479 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700480 },
Dan O'Mearadd494642020-05-01 07:42:23 -0700481 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
482 # service will choose a number of threads (according to the number of cores
483 # on the selected machine type for batch, or 1 by convention for streaming).
484 "poolArgs": { # Extra arguments for this worker pool.
485 "a_key": "", # Properties of the object. Contains field @type with type URL.
486 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700487 "packages": [ # Packages to be installed on workers.
488 { # The packages that must be installed in order for a worker to run the
489 # steps of the Cloud Dataflow job that will be assigned to its worker
490 # pool.
491 #
492 # This is the mechanism by which the Cloud Dataflow SDK causes code to
493 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
494 # might use this to install jars containing the user's code and all of the
495 # various dependencies (libraries, data files, etc.) required in order
496 # for that code to run.
497 "location": "A String", # The resource to read the package from. The supported resource type is:
498 #
499 # Google Cloud Storage:
500 #
501 # storage.googleapis.com/{bucket}
502 # bucket.storage.googleapis.com/
503 "name": "A String", # The name of the package.
504 },
505 ],
Dan O'Mearadd494642020-05-01 07:42:23 -0700506 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
507 # select a default set of packages which are useful to worker
508 # harnesses written in a particular language.
509 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
510 # are supported.
511 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700512 # attempt to choose a reasonable default.
513 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
514 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
515 # `TEARDOWN_NEVER`.
516 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
517 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
518 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
519 # down.
520 #
521 # If the workers are not torn down by the service, they will
522 # continue to run and use Google Compute Engine VM resources in the
523 # user's project until they are explicitly terminated by the user.
524 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
525 # policy except for small, manually supervised test jobs.
526 #
527 # If unknown or unspecified, the service will attempt to choose a reasonable
528 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -0700529 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
530 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700531 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
532 # execute the job. If zero or unspecified, the service will
533 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700534 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
535 # the form "regions/REGION/subnetworks/SUBNETWORK".
536 "dataDisks": [ # Data disks that are used by a VM in this workflow.
537 { # Describes the data disk used by a workflow job.
538 "mountPoint": "A String", # Directory in a VM where disk is mounted.
539 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
540 # attempt to choose a reasonable default.
541 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
542 # must be a disk type appropriate to the project and zone in which
543 # the workers will run. If unknown or unspecified, the service
544 # will attempt to choose a reasonable default.
545 #
546 # For example, the standard persistent disk type is a resource name
547 # typically ending in "pd-standard". If SSD persistent disks are
548 # available, the resource name typically ends with "pd-ssd". The
549 # actual valid values are defined the Google Compute Engine API,
550 # not by the Cloud Dataflow API; consult the Google Compute Engine
551 # documentation for more information about determining the set of
552 # available disk types for a particular project and zone.
553 #
554 # Google Compute Engine Disk types are local to a particular
555 # project in a particular zone, and so the resource name will
556 # typically look something like this:
557 #
558 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
559 },
560 ],
Dan O'Mearadd494642020-05-01 07:42:23 -0700561 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
562 # only be set in the Fn API path. For non-cross-language pipelines this
563 # should have only one entry. Cross-language pipelines will have two or more
564 # entries.
565 { # Defines a SDK harness container for executing Dataflow pipelines.
566 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
567 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
568 # container instance with this image. If false (or unset) recommends using
569 # more than one core per SDK container instance with this image for
570 # efficiency. Note that Dataflow service may choose to override this property
571 # if needed.
572 },
573 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700574 },
575 ],
Dan O'Mearadd494642020-05-01 07:42:23 -0700576 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
577 # unspecified, the service will attempt to choose a reasonable
578 # default. This should be in the form of the API service name,
579 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700580 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
581 # storage. The system will append the suffix "/temp-{JOBNAME} to
582 # this resource prefix, where {JOBNAME} is the value of the
583 # job_name field. The resulting bucket and object prefix is used
584 # as the prefix of the resources used to store temporary data
585 # needed during the job execution. NOTE: This will override the
586 # value in taskrunner_settings.
587 # The supported resource type is:
588 #
589 # Google Cloud Storage:
590 #
591 # storage.googleapis.com/{bucket}/{object}
592 # bucket.storage.googleapis.com/{object}
593 },
594 "location": "A String", # The [regional endpoint]
595 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
596 # contains this job.
597 "tempFiles": [ # A set of files the system should be aware of that are used
598 # for temporary storage. These temporary files will be
599 # removed on job completion.
600 # No duplicates are allowed.
601 # No file patterns are supported.
602 #
603 # The supported files are:
604 #
605 # Google Cloud Storage:
606 #
607 # storage.googleapis.com/{bucket}/{object}
608 # bucket.storage.googleapis.com/{object}
609 "A String",
610 ],
611 "type": "A String", # The type of Cloud Dataflow job.
612 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
613 # If this field is set, the service will ensure its uniqueness.
614 # The request to create a job will fail if the service has knowledge of a
615 # previously submitted job with the same client's ID and job name.
616 # The caller may use this field to ensure idempotence of job
617 # creation across retried attempts to create a job.
618 # By default, the field is empty and, in that case, the service ignores it.
619 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
620 # snapshot.
621 "stepsLocation": "A String", # The GCS location where the steps are stored.
622 "currentStateTime": "A String", # The timestamp associated with the current state.
623 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
624 # Flexible resource scheduling jobs are started with some delay after job
625 # creation, so start_time is unset before start and is updated when the
626 # job is started by the Cloud Dataflow service. For other jobs, start_time
627 # always equals to create_time and is immutable and set by the Cloud Dataflow
628 # service.
629 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
630 # Cloud Dataflow service.
631 "requestedState": "A String", # The job's requested state.
632 #
633 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
634 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
635 # also be used to directly set a job's requested state to
636 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
637 # job if it has not already reached a terminal state.
638 "name": "A String", # The user-specified Cloud Dataflow job name.
639 #
640 # Only one Job with a given name may exist in a project at any
641 # given time. If a caller attempts to create a Job with the same
642 # name as an already-existing Job, the attempt returns the
643 # existing Job.
644 #
645 # The name must match the regular expression
646 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
647 "steps": [ # Exactly one of step or steps_location should be specified.
648 #
649 # The top-level steps that constitute the entire job.
650 { # Defines a particular step within a Cloud Dataflow job.
651 #
652 # A job consists of multiple steps, each of which performs some
653 # specific operation as part of the overall job. Data is typically
654 # passed from one step to another as part of the job.
655 #
656 # Here's an example of a sequence of steps which together implement a
657 # Map-Reduce job:
658 #
659 # * Read a collection of data from some source, parsing the
660 # collection's elements.
661 #
662 # * Validate the elements.
663 #
664 # * Apply a user-defined function to map each element to some value
665 # and extract an element-specific key value.
666 #
667 # * Group elements with the same key into a single element with
668 # that key, transforming a multiply-keyed collection into a
669 # uniquely-keyed collection.
670 #
671 # * Write the elements out to some data sink.
672 #
673 # Note that the Cloud Dataflow service may be used to run many different
674 # types of jobs, not just Map-Reduce.
675 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700676 "name": "A String", # The name that identifies the step. This must be unique for each
677 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700678 "properties": { # Named properties associated with the step. Each kind of
679 # predefined step has its own required set of properties.
680 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
681 "a_key": "", # Properties of the object.
682 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700683 },
684 ],
685 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
686 # of the job it replaced.
687 #
688 # When sending a `CreateJobRequest`, you can update a job by specifying it
689 # here. The job named here is stopped, and its intermediate state is
690 # transferred to this job.
691 "currentState": "A String", # The current state of the job.
692 #
693 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
694 # specified.
695 #
696 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
697 # terminal state. After a job has reached a terminal state, no
698 # further state updates may be made.
699 #
700 # This field may be mutated by the Cloud Dataflow service;
701 # callers cannot mutate it.
702 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
703 # isn't contained in the submitted job.
704 "stages": { # A mapping from each stage to the information about that stage.
705 "a_key": { # Contains information about how a particular
706 # google.dataflow.v1beta3.Step will be executed.
707 "stepName": [ # The steps associated with the execution stage.
708 # Note that stages may have several steps, and that a given step
709 # might be run by more than one stage.
710 "A String",
711 ],
712 },
713 },
714 },
715}
716
717 x__xgafv: string, V1 error format.
718 Allowed values
719 1 - v1 error format
720 2 - v2 error format
721 replaceJobId: string, Deprecated. This field is now in the Job message.
722 view: string, The level of information requested in response.
723
724Returns:
725 An object of the form:
726
727 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400728 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700729 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400730 # The labels map can contain no more than 64 entries. Entries of the labels
731 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700732 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400733 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
734 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -0700735 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400736 # size.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800737 "a_key": "A String",
738 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700739 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
740 # by the metadata values provided here. Populated for ListJobs and all GetJob
741 # views SUMMARY and higher.
742 # ListJob response and Job SUMMARY view.
743 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
744 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
745 "version": "A String", # The version of the SDK used to run the job.
746 "sdkSupportStatus": "A String", # The support status for this SDK version.
747 },
748 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
749 { # Metadata for a PubSub connector used by the job.
750 "topic": "A String", # Topic accessed in the connection.
751 "subscription": "A String", # Subscription used in the connection.
752 },
753 ],
754 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
755 { # Metadata for a Datastore connector used by the job.
756 "projectId": "A String", # ProjectId accessed in the connection.
757 "namespace": "A String", # Namespace used in the connection.
758 },
759 ],
760 "fileDetails": [ # Identification of a File source used in the Dataflow job.
761 { # Metadata for a File connector used by the job.
762 "filePattern": "A String", # File Pattern used to access files by the connector.
763 },
764 ],
765 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
766 { # Metadata for a Spanner connector used by the job.
767 "instanceId": "A String", # InstanceId accessed in the connection.
768 "projectId": "A String", # ProjectId accessed in the connection.
769 "databaseId": "A String", # DatabaseId accessed in the connection.
770 },
771 ],
772 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
773 { # Metadata for a BigTable connector used by the job.
774 "instanceId": "A String", # InstanceId accessed in the connection.
775 "projectId": "A String", # ProjectId accessed in the connection.
776 "tableId": "A String", # TableId accessed in the connection.
777 },
778 ],
779 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
780 { # Metadata for a BigQuery connector used by the job.
781 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700782 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -0700783 "table": "A String", # Table accessed in the connection.
784 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700785 },
786 ],
787 },
788 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
789 # A description of the user pipeline and stages through which it is executed.
790 # Created by Cloud Dataflow service. Only retrieved with
791 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
792 # form. This data is provided by the Dataflow service for ease of visualizing
793 # the pipeline and interpreting Dataflow provided metrics.
794 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
795 { # Description of the type, names/ids, and input/outputs for a transform.
796 "kind": "A String", # Type of transform.
797 "name": "A String", # User provided name for this transform instance.
798 "inputCollectionName": [ # User names for all collection inputs to this transform.
799 "A String",
800 ],
801 "displayData": [ # Transform-specific display data.
802 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700803 "key": "A String", # The key identifying the display data.
804 # This is intended to be used as a label for the display data
805 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700806 "shortStrValue": "A String", # A possible additional shorter value to display.
807 # For example a java_class_name_value of com.mypackage.MyDoFn
808 # will be stored with MyDoFn as the short_str_value and
809 # com.mypackage.MyDoFn as the java_class_name value.
810 # short_str_value can be displayed and java_class_name_value
811 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700812 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700813 "url": "A String", # An optional full URL.
814 "floatValue": 3.14, # Contains value if the data is of float type.
815 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
816 # language namespace (i.e. python module) which defines the display data.
817 # This allows a dax monitoring system to specially handle the data
818 # and perform custom rendering.
819 "javaClassValue": "A String", # Contains value if the data is of java class type.
820 "label": "A String", # An optional label to display in a dax UI for the element.
821 "boolValue": True or False, # Contains value if the data is of a boolean type.
822 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700823 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700824 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700825 },
826 ],
827 "outputCollectionName": [ # User names for all collection outputs to this transform.
828 "A String",
829 ],
830 "id": "A String", # SDK generated id of this transform instance.
831 },
832 ],
833 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
834 { # Description of the composing transforms, names/ids, and input/outputs of a
835 # stage of execution. Some composing transforms and sources may have been
836 # generated by the Dataflow service during execution planning.
837 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
838 { # Description of an interstitial value between transforms in an execution
839 # stage.
840 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
841 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
842 # source is most closely associated.
843 "name": "A String", # Dataflow service generated name for this source.
844 },
845 ],
846 "kind": "A String", # Type of tranform this stage is executing.
847 "name": "A String", # Dataflow service generated name for this stage.
848 "outputSource": [ # Output sources for this stage.
849 { # Description of an input or output of an execution stage.
850 "userName": "A String", # Human-readable name for this source; may be user or system generated.
851 "sizeBytes": "A String", # Size of the source, if measurable.
852 "name": "A String", # Dataflow service generated name for this source.
853 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
854 # source is most closely associated.
855 },
856 ],
857 "inputSource": [ # Input sources for this stage.
858 { # Description of an input or output of an execution stage.
859 "userName": "A String", # Human-readable name for this source; may be user or system generated.
860 "sizeBytes": "A String", # Size of the source, if measurable.
861 "name": "A String", # Dataflow service generated name for this source.
862 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
863 # source is most closely associated.
864 },
865 ],
866 "componentTransform": [ # Transforms that comprise this execution stage.
867 { # Description of a transform executed as part of an execution stage.
868 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
869 "originalTransform": "A String", # User name for the original user transform with which this transform is
870 # most closely associated.
871 "name": "A String", # Dataflow service generated name for this source.
872 },
873 ],
874 "id": "A String", # Dataflow service generated id for this stage.
875 },
876 ],
877 "displayData": [ # Pipeline level display data.
878 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -0700879 "key": "A String", # The key identifying the display data.
880 # This is intended to be used as a label for the display data
881 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700882 "shortStrValue": "A String", # A possible additional shorter value to display.
883 # For example a java_class_name_value of com.mypackage.MyDoFn
884 # will be stored with MyDoFn as the short_str_value and
885 # com.mypackage.MyDoFn as the java_class_name value.
886 # short_str_value can be displayed and java_class_name_value
887 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -0700888 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700889 "url": "A String", # An optional full URL.
890 "floatValue": 3.14, # Contains value if the data is of float type.
891 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
892 # language namespace (i.e. python module) which defines the display data.
893 # This allows a dax monitoring system to specially handle the data
894 # and perform custom rendering.
895 "javaClassValue": "A String", # Contains value if the data is of java class type.
896 "label": "A String", # An optional label to display in a dax UI for the element.
897 "boolValue": True or False, # Contains value if the data is of a boolean type.
898 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -0700899 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700900 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700901 },
902 ],
903 },
904 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
905 # callers cannot mutate it.
906 { # A message describing the state of a particular execution stage.
907 "executionStageName": "A String", # The name of the execution stage.
908 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
909 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
910 },
911 ],
912 "id": "A String", # The unique ID of this job.
913 #
914 # This field is set by the Cloud Dataflow service when the Job is
915 # created, and is immutable for the life of the job.
916 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
917 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
918 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400919 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
920 # corresponding name prefixes of the new job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800921 "a_key": "A String",
922 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400923 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700924 "workerRegion": "A String", # The Compute Engine region
925 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
926 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
927 # with worker_zone. If neither worker_region nor worker_zone is specified,
928 # default to the control plane's region.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400929 "version": { # A structure describing which components and their versions of the service
930 # are required in order to run the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800931 "a_key": "", # Properties of the object.
932 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700933 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
934 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
935 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400936 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -0700937 # Format:
938 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800939 "internalExperiments": { # Experimental settings.
940 "a_key": "", # Properties of the object. Contains field @type with type URL.
941 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400942 "dataset": "A String", # The dataset for the current project where various workflow
943 # related tables are stored.
944 #
945 # The supported resource type is:
946 #
947 # Google BigQuery:
948 # bigquery.googleapis.com/{dataset}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800949 "experiments": [ # The list of experiments to enable.
950 "A String",
951 ],
952 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400953 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
954 # options are passed through the service and are used to recreate the
955 # SDK pipeline options on the worker in a language agnostic and platform
956 # independent way.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800957 "a_key": "", # Properties of the object.
958 },
959 "userAgent": { # A description of the process that generated the request.
960 "a_key": "", # Properties of the object.
961 },
Dan O'Mearadd494642020-05-01 07:42:23 -0700962 "workerZone": "A String", # The Compute Engine zone
963 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
964 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
965 # with worker_region. If neither worker_region nor worker_zone is specified,
966 # a zone in the control plane's region is chosen based on available capacity.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400967 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
968 # specified in order for the job to have workers.
969 { # Describes one particular pool of Cloud Dataflow workers to be
970 # instantiated by the Cloud Dataflow service in order to perform the
971 # computations required by a job. Note that a workflow job may use
972 # multiple pools, in order to match the various computational
973 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -0700974 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
975 # harness, residing in Google Container Registry.
976 #
977 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
978 "ipConfiguration": "A String", # Configuration for VM IPs.
979 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
980 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
981 "algorithm": "A String", # The algorithm to use for autoscaling.
982 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -0800983 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -0700984 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
985 # the service will use the network "default".
986 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
987 # will attempt to choose a reasonable default.
988 "metadata": { # Metadata to set on the Google Compute Engine VMs.
989 "a_key": "A String",
990 },
991 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
992 # service will attempt to choose a reasonable default.
993 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
994 # Compute Engine API.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -0400995 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
996 # using the standard Dataflow task runner. Users should ignore
997 # this field.
998 "workflowFileName": "A String", # The file to store the workflow in.
999 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
1000 # will not be uploaded.
1001 #
1002 # The supported resource type is:
1003 #
1004 # Google Cloud Storage:
1005 # storage.googleapis.com/{bucket}/{object}
1006 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001007 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07001008 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
1009 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1010 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1011 "vmId": "A String", # The ID string of the VM.
1012 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1013 # taskrunner; e.g. "wheel".
1014 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1015 # taskrunner; e.g. "root".
1016 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1017 # access the Cloud Dataflow API.
1018 "A String",
1019 ],
1020 "languageHint": "A String", # The suggested backend language.
1021 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1022 # console.
1023 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1024 "logDir": "A String", # The directory on the VM to store logs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001025 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1026 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1027 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1028 # "shuffle/v1beta1".
1029 "workerId": "A String", # The ID of the worker running this pipeline.
1030 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1031 #
1032 # When workers access Google Cloud APIs, they logically do so via
1033 # relative URLs. If this field is specified, it supplies the base
1034 # URL to use for resolving these relative URLs. The normative
1035 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1036 # Locators".
1037 #
1038 # If not specified, the default value is "http://www.googleapis.com/"
1039 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1040 # "dataflow/v1b3/projects".
1041 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1042 # storage.
1043 #
1044 # The supported resource type is:
1045 #
1046 # Google Cloud Storage:
1047 #
1048 # storage.googleapis.com/{bucket}/{object}
1049 # bucket.storage.googleapis.com/{object}
1050 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001051 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001052 "harnessCommand": "A String", # The command to launch the worker harness.
1053 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1054 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001055 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07001056 # The supported resource type is:
1057 #
1058 # Google Cloud Storage:
1059 # storage.googleapis.com/{bucket}/{object}
1060 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07001061 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1062 #
1063 # When workers access Google Cloud APIs, they logically do so via
1064 # relative URLs. If this field is specified, it supplies the base
1065 # URL to use for resolving these relative URLs. The normative
1066 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1067 # Locators".
1068 #
1069 # If not specified, the default value is "http://www.googleapis.com/"
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001070 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001071 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1072 # service will choose a number of threads (according to the number of cores
1073 # on the selected machine type for batch, or 1 by convention for streaming).
1074 "poolArgs": { # Extra arguments for this worker pool.
1075 "a_key": "", # Properties of the object. Contains field @type with type URL.
1076 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001077 "packages": [ # Packages to be installed on workers.
1078 { # The packages that must be installed in order for a worker to run the
1079 # steps of the Cloud Dataflow job that will be assigned to its worker
1080 # pool.
1081 #
1082 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1083 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1084 # might use this to install jars containing the user's code and all of the
1085 # various dependencies (libraries, data files, etc.) required in order
1086 # for that code to run.
1087 "location": "A String", # The resource to read the package from. The supported resource type is:
1088 #
1089 # Google Cloud Storage:
1090 #
1091 # storage.googleapis.com/{bucket}
1092 # bucket.storage.googleapis.com/
1093 "name": "A String", # The name of the package.
1094 },
1095 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001096 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1097 # select a default set of packages which are useful to worker
1098 # harnesses written in a particular language.
1099 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1100 # are supported.
1101 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001102 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001103 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1104 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1105 # `TEARDOWN_NEVER`.
1106 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1107 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1108 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1109 # down.
1110 #
1111 # If the workers are not torn down by the service, they will
1112 # continue to run and use Google Compute Engine VM resources in the
1113 # user's project until they are explicitly terminated by the user.
1114 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1115 # policy except for small, manually supervised test jobs.
1116 #
1117 # If unknown or unspecified, the service will attempt to choose a reasonable
1118 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07001119 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1120 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001121 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1122 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001123 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001124 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1125 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001126 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1127 { # Describes the data disk used by a workflow job.
1128 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1129 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1130 # attempt to choose a reasonable default.
1131 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1132 # must be a disk type appropriate to the project and zone in which
1133 # the workers will run. If unknown or unspecified, the service
1134 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001135 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001136 # For example, the standard persistent disk type is a resource name
1137 # typically ending in "pd-standard". If SSD persistent disks are
1138 # available, the resource name typically ends with "pd-ssd". The
1139 # actual valid values are defined the Google Compute Engine API,
1140 # not by the Cloud Dataflow API; consult the Google Compute Engine
1141 # documentation for more information about determining the set of
1142 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001143 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001144 # Google Compute Engine Disk types are local to a particular
1145 # project in a particular zone, and so the resource name will
1146 # typically look something like this:
1147 #
1148 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001149 },
1150 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001151 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
1152 # only be set in the Fn API path. For non-cross-language pipelines this
1153 # should have only one entry. Cross-language pipelines will have two or more
1154 # entries.
1155 { # Defines a SDK harness container for executing Dataflow pipelines.
1156 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
1157 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
1158 # container instance with this image. If false (or unset) recommends using
1159 # more than one core per SDK container instance with this image for
1160 # efficiency. Note that Dataflow service may choose to override this property
1161 # if needed.
1162 },
1163 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001164 },
1165 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001166 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
1167 # unspecified, the service will attempt to choose a reasonable
1168 # default. This should be in the form of the API service name,
1169 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001170 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1171 # storage. The system will append the suffix "/temp-{JOBNAME} to
1172 # this resource prefix, where {JOBNAME} is the value of the
1173 # job_name field. The resulting bucket and object prefix is used
1174 # as the prefix of the resources used to store temporary data
1175 # needed during the job execution. NOTE: This will override the
1176 # value in taskrunner_settings.
1177 # The supported resource type is:
1178 #
1179 # Google Cloud Storage:
1180 #
1181 # storage.googleapis.com/{bucket}/{object}
1182 # bucket.storage.googleapis.com/{object}
1183 },
1184 "location": "A String", # The [regional endpoint]
1185 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1186 # contains this job.
1187 "tempFiles": [ # A set of files the system should be aware of that are used
1188 # for temporary storage. These temporary files will be
1189 # removed on job completion.
1190 # No duplicates are allowed.
1191 # No file patterns are supported.
1192 #
1193 # The supported files are:
1194 #
1195 # Google Cloud Storage:
1196 #
1197 # storage.googleapis.com/{bucket}/{object}
1198 # bucket.storage.googleapis.com/{object}
1199 "A String",
1200 ],
1201 "type": "A String", # The type of Cloud Dataflow job.
1202 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1203 # If this field is set, the service will ensure its uniqueness.
1204 # The request to create a job will fail if the service has knowledge of a
1205 # previously submitted job with the same client's ID and job name.
1206 # The caller may use this field to ensure idempotence of job
1207 # creation across retried attempts to create a job.
1208 # By default, the field is empty and, in that case, the service ignores it.
1209 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1210 # snapshot.
1211 "stepsLocation": "A String", # The GCS location where the steps are stored.
1212 "currentStateTime": "A String", # The timestamp associated with the current state.
1213 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1214 # Flexible resource scheduling jobs are started with some delay after job
1215 # creation, so start_time is unset before start and is updated when the
1216 # job is started by the Cloud Dataflow service. For other jobs, start_time
1217 # always equals to create_time and is immutable and set by the Cloud Dataflow
1218 # service.
1219 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1220 # Cloud Dataflow service.
1221 "requestedState": "A String", # The job's requested state.
1222 #
1223 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1224 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1225 # also be used to directly set a job's requested state to
1226 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1227 # job if it has not already reached a terminal state.
1228 "name": "A String", # The user-specified Cloud Dataflow job name.
1229 #
1230 # Only one Job with a given name may exist in a project at any
1231 # given time. If a caller attempts to create a Job with the same
1232 # name as an already-existing Job, the attempt returns the
1233 # existing Job.
1234 #
1235 # The name must match the regular expression
1236 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1237 "steps": [ # Exactly one of step or steps_location should be specified.
1238 #
1239 # The top-level steps that constitute the entire job.
1240 { # Defines a particular step within a Cloud Dataflow job.
1241 #
1242 # A job consists of multiple steps, each of which performs some
1243 # specific operation as part of the overall job. Data is typically
1244 # passed from one step to another as part of the job.
1245 #
1246 # Here's an example of a sequence of steps which together implement a
1247 # Map-Reduce job:
1248 #
1249 # * Read a collection of data from some source, parsing the
1250 # collection's elements.
1251 #
1252 # * Validate the elements.
1253 #
1254 # * Apply a user-defined function to map each element to some value
1255 # and extract an element-specific key value.
1256 #
1257 # * Group elements with the same key into a single element with
1258 # that key, transforming a multiply-keyed collection into a
1259 # uniquely-keyed collection.
1260 #
1261 # * Write the elements out to some data sink.
1262 #
1263 # Note that the Cloud Dataflow service may be used to run many different
1264 # types of jobs, not just Map-Reduce.
1265 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001266 "name": "A String", # The name that identifies the step. This must be unique for each
1267 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001268 "properties": { # Named properties associated with the step. Each kind of
1269 # predefined step has its own required set of properties.
1270 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
1271 "a_key": "", # Properties of the object.
1272 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001273 },
1274 ],
1275 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1276 # of the job it replaced.
1277 #
1278 # When sending a `CreateJobRequest`, you can update a job by specifying it
1279 # here. The job named here is stopped, and its intermediate state is
1280 # transferred to this job.
1281 "currentState": "A String", # The current state of the job.
1282 #
1283 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1284 # specified.
1285 #
1286 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1287 # terminal state. After a job has reached a terminal state, no
1288 # further state updates may be made.
1289 #
1290 # This field may be mutated by the Cloud Dataflow service;
1291 # callers cannot mutate it.
1292 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1293 # isn't contained in the submitted job.
1294 "stages": { # A mapping from each stage to the information about that stage.
1295 "a_key": { # Contains information about how a particular
1296 # google.dataflow.v1beta3.Step will be executed.
1297 "stepName": [ # The steps associated with the execution stage.
1298 # Note that stages may have several steps, and that a given step
1299 # might be run by more than one stage.
1300 "A String",
1301 ],
1302 },
1303 },
1304 },
1305 }</pre>
1306</div>
1307
1308<div class="method">
1309 <code class="details" id="get">get(projectId, location, jobId, x__xgafv=None, view=None)</code>
1310 <pre>Gets the state of the specified Cloud Dataflow job.
1311
1312To get the state of a job, we recommend using `projects.locations.jobs.get`
1313with a [regional endpoint]
1314(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1315`projects.jobs.get` is not recommended, as you can only get the state of
1316jobs that are running in `us-central1`.
1317
1318Args:
1319 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
1320 location: string, The [regional endpoint]
1321(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1322contains this job. (required)
1323 jobId: string, The job ID. (required)
1324 x__xgafv: string, V1 error format.
1325 Allowed values
1326 1 - v1 error format
1327 2 - v2 error format
1328 view: string, The level of information requested in response.
1329
1330Returns:
1331 An object of the form:
1332
1333 { # Defines a job to be run by the Cloud Dataflow service.
1334 "labels": { # User-defined labels for this job.
1335 #
1336 # The labels map can contain no more than 64 entries. Entries of the labels
1337 # map are UTF8 strings that comply with the following restrictions:
1338 #
1339 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
1340 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07001341 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001342 # size.
1343 "a_key": "A String",
1344 },
1345 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
1346 # by the metadata values provided here. Populated for ListJobs and all GetJob
1347 # views SUMMARY and higher.
1348 # ListJob response and Job SUMMARY view.
1349 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
1350 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
1351 "version": "A String", # The version of the SDK used to run the job.
1352 "sdkSupportStatus": "A String", # The support status for this SDK version.
1353 },
1354 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
1355 { # Metadata for a PubSub connector used by the job.
1356 "topic": "A String", # Topic accessed in the connection.
1357 "subscription": "A String", # Subscription used in the connection.
1358 },
1359 ],
1360 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
1361 { # Metadata for a Datastore connector used by the job.
1362 "projectId": "A String", # ProjectId accessed in the connection.
1363 "namespace": "A String", # Namespace used in the connection.
1364 },
1365 ],
1366 "fileDetails": [ # Identification of a File source used in the Dataflow job.
1367 { # Metadata for a File connector used by the job.
1368 "filePattern": "A String", # File Pattern used to access files by the connector.
1369 },
1370 ],
1371 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
1372 { # Metadata for a Spanner connector used by the job.
1373 "instanceId": "A String", # InstanceId accessed in the connection.
1374 "projectId": "A String", # ProjectId accessed in the connection.
1375 "databaseId": "A String", # DatabaseId accessed in the connection.
1376 },
1377 ],
1378 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
1379 { # Metadata for a BigTable connector used by the job.
1380 "instanceId": "A String", # InstanceId accessed in the connection.
1381 "projectId": "A String", # ProjectId accessed in the connection.
1382 "tableId": "A String", # TableId accessed in the connection.
1383 },
1384 ],
1385 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
1386 { # Metadata for a BigQuery connector used by the job.
1387 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001388 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07001389 "table": "A String", # Table accessed in the connection.
1390 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001391 },
1392 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001393 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001394 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
1395 # A description of the user pipeline and stages through which it is executed.
1396 # Created by Cloud Dataflow service. Only retrieved with
1397 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
1398 # form. This data is provided by the Dataflow service for ease of visualizing
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001399 # the pipeline and interpreting Dataflow provided metrics.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001400 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
1401 { # Description of the type, names/ids, and input/outputs for a transform.
1402 "kind": "A String", # Type of transform.
1403 "name": "A String", # User provided name for this transform instance.
1404 "inputCollectionName": [ # User names for all collection inputs to this transform.
1405 "A String",
1406 ],
1407 "displayData": [ # Transform-specific display data.
1408 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07001409 "key": "A String", # The key identifying the display data.
1410 # This is intended to be used as a label for the display data
1411 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001412 "shortStrValue": "A String", # A possible additional shorter value to display.
1413 # For example a java_class_name_value of com.mypackage.MyDoFn
1414 # will be stored with MyDoFn as the short_str_value and
1415 # com.mypackage.MyDoFn as the java_class_name value.
1416 # short_str_value can be displayed and java_class_name_value
1417 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07001418 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001419 "url": "A String", # An optional full URL.
1420 "floatValue": 3.14, # Contains value if the data is of float type.
1421 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1422 # language namespace (i.e. python module) which defines the display data.
1423 # This allows a dax monitoring system to specially handle the data
1424 # and perform custom rendering.
1425 "javaClassValue": "A String", # Contains value if the data is of java class type.
1426 "label": "A String", # An optional label to display in a dax UI for the element.
1427 "boolValue": True or False, # Contains value if the data is of a boolean type.
1428 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07001429 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001430 "int64Value": "A String", # Contains value if the data is of int64 type.
1431 },
1432 ],
1433 "outputCollectionName": [ # User names for all collection outputs to this transform.
1434 "A String",
1435 ],
1436 "id": "A String", # SDK generated id of this transform instance.
1437 },
1438 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001439 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
1440 { # Description of the composing transforms, names/ids, and input/outputs of a
1441 # stage of execution. Some composing transforms and sources may have been
1442 # generated by the Dataflow service during execution planning.
1443 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
1444 { # Description of an interstitial value between transforms in an execution
1445 # stage.
1446 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1447 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1448 # source is most closely associated.
1449 "name": "A String", # Dataflow service generated name for this source.
1450 },
1451 ],
1452 "kind": "A String", # Type of tranform this stage is executing.
1453 "name": "A String", # Dataflow service generated name for this stage.
1454 "outputSource": [ # Output sources for this stage.
1455 { # Description of an input or output of an execution stage.
1456 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1457 "sizeBytes": "A String", # Size of the source, if measurable.
1458 "name": "A String", # Dataflow service generated name for this source.
1459 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1460 # source is most closely associated.
1461 },
1462 ],
1463 "inputSource": [ # Input sources for this stage.
1464 { # Description of an input or output of an execution stage.
1465 "userName": "A String", # Human-readable name for this source; may be user or system generated.
1466 "sizeBytes": "A String", # Size of the source, if measurable.
1467 "name": "A String", # Dataflow service generated name for this source.
1468 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
1469 # source is most closely associated.
1470 },
1471 ],
1472 "componentTransform": [ # Transforms that comprise this execution stage.
1473 { # Description of a transform executed as part of an execution stage.
1474 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
1475 "originalTransform": "A String", # User name for the original user transform with which this transform is
1476 # most closely associated.
1477 "name": "A String", # Dataflow service generated name for this source.
1478 },
1479 ],
1480 "id": "A String", # Dataflow service generated id for this stage.
1481 },
1482 ],
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001483 "displayData": [ # Pipeline level display data.
1484 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07001485 "key": "A String", # The key identifying the display data.
1486 # This is intended to be used as a label for the display data
1487 # when viewed in a dax monitoring system.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001488 "shortStrValue": "A String", # A possible additional shorter value to display.
1489 # For example a java_class_name_value of com.mypackage.MyDoFn
1490 # will be stored with MyDoFn as the short_str_value and
1491 # com.mypackage.MyDoFn as the java_class_name value.
1492 # short_str_value can be displayed and java_class_name_value
1493 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07001494 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001495 "url": "A String", # An optional full URL.
1496 "floatValue": 3.14, # Contains value if the data is of float type.
1497 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
1498 # language namespace (i.e. python module) which defines the display data.
1499 # This allows a dax monitoring system to specially handle the data
1500 # and perform custom rendering.
1501 "javaClassValue": "A String", # Contains value if the data is of java class type.
1502 "label": "A String", # An optional label to display in a dax UI for the element.
1503 "boolValue": True or False, # Contains value if the data is of a boolean type.
1504 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07001505 "durationValue": "A String", # Contains value if the data is of duration type.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001506 "int64Value": "A String", # Contains value if the data is of int64 type.
1507 },
1508 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001509 },
1510 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
1511 # callers cannot mutate it.
1512 { # A message describing the state of a particular execution stage.
1513 "executionStageName": "A String", # The name of the execution stage.
1514 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
1515 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
1516 },
1517 ],
1518 "id": "A String", # The unique ID of this job.
1519 #
1520 # This field is set by the Cloud Dataflow service when the Job is
1521 # created, and is immutable for the life of the job.
1522 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
1523 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
1524 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
1525 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
1526 # corresponding name prefixes of the new job.
1527 "a_key": "A String",
1528 },
1529 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001530 "workerRegion": "A String", # The Compute Engine region
1531 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1532 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
1533 # with worker_zone. If neither worker_region nor worker_zone is specified,
1534 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001535 "version": { # A structure describing which components and their versions of the service
1536 # are required in order to run the job.
1537 "a_key": "", # Properties of the object.
1538 },
1539 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
1540 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
1541 # at rest, AKA a Customer Managed Encryption Key (CMEK).
1542 #
1543 # Format:
1544 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
1545 "internalExperiments": { # Experimental settings.
1546 "a_key": "", # Properties of the object. Contains field @type with type URL.
1547 },
1548 "dataset": "A String", # The dataset for the current project where various workflow
1549 # related tables are stored.
1550 #
1551 # The supported resource type is:
1552 #
1553 # Google BigQuery:
1554 # bigquery.googleapis.com/{dataset}
1555 "experiments": [ # The list of experiments to enable.
1556 "A String",
1557 ],
1558 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
1559 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
1560 # options are passed through the service and are used to recreate the
1561 # SDK pipeline options on the worker in a language agnostic and platform
1562 # independent way.
1563 "a_key": "", # Properties of the object.
1564 },
1565 "userAgent": { # A description of the process that generated the request.
1566 "a_key": "", # Properties of the object.
1567 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001568 "workerZone": "A String", # The Compute Engine zone
1569 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
1570 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
1571 # with worker_region. If neither worker_region nor worker_zone is specified,
1572 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001573 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
1574 # specified in order for the job to have workers.
1575 { # Describes one particular pool of Cloud Dataflow workers to be
1576 # instantiated by the Cloud Dataflow service in order to perform the
1577 # computations required by a job. Note that a workflow job may use
1578 # multiple pools, in order to match the various computational
1579 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001580 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
1581 # harness, residing in Google Container Registry.
1582 #
1583 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
1584 "ipConfiguration": "A String", # Configuration for VM IPs.
1585 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
1586 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
1587 "algorithm": "A String", # The algorithm to use for autoscaling.
1588 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001589 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07001590 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
1591 # the service will use the network "default".
1592 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
1593 # will attempt to choose a reasonable default.
1594 "metadata": { # Metadata to set on the Google Compute Engine VMs.
1595 "a_key": "A String",
1596 },
1597 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
1598 # service will attempt to choose a reasonable default.
1599 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
1600 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001601 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
1602 # using the standard Dataflow task runner. Users should ignore
1603 # this field.
1604 "workflowFileName": "A String", # The file to store the workflow in.
1605 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
1606 # will not be uploaded.
1607 #
1608 # The supported resource type is:
1609 #
1610 # Google Cloud Storage:
1611 # storage.googleapis.com/{bucket}/{object}
1612 # bucket.storage.googleapis.com/{object}
1613 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07001614 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
1615 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
1616 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
1617 "vmId": "A String", # The ID string of the VM.
1618 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
1619 # taskrunner; e.g. "wheel".
1620 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
1621 # taskrunner; e.g. "root".
1622 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
1623 # access the Cloud Dataflow API.
1624 "A String",
1625 ],
1626 "languageHint": "A String", # The suggested backend language.
1627 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
1628 # console.
1629 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
1630 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001631 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
1632 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
1633 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
1634 # "shuffle/v1beta1".
1635 "workerId": "A String", # The ID of the worker running this pipeline.
1636 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
1637 #
1638 # When workers access Google Cloud APIs, they logically do so via
1639 # relative URLs. If this field is specified, it supplies the base
1640 # URL to use for resolving these relative URLs. The normative
1641 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1642 # Locators".
1643 #
1644 # If not specified, the default value is "http://www.googleapis.com/"
1645 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
1646 # "dataflow/v1b3/projects".
1647 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1648 # storage.
1649 #
1650 # The supported resource type is:
1651 #
1652 # Google Cloud Storage:
1653 #
1654 # storage.googleapis.com/{bucket}/{object}
1655 # bucket.storage.googleapis.com/{object}
1656 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001657 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
1658 "harnessCommand": "A String", # The command to launch the worker harness.
1659 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
1660 # temporary storage.
1661 #
1662 # The supported resource type is:
1663 #
1664 # Google Cloud Storage:
1665 # storage.googleapis.com/{bucket}/{object}
1666 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07001667 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
1668 #
1669 # When workers access Google Cloud APIs, they logically do so via
1670 # relative URLs. If this field is specified, it supplies the base
1671 # URL to use for resolving these relative URLs. The normative
1672 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
1673 # Locators".
1674 #
1675 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001676 },
Dan O'Mearadd494642020-05-01 07:42:23 -07001677 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
1678 # service will choose a number of threads (according to the number of cores
1679 # on the selected machine type for batch, or 1 by convention for streaming).
1680 "poolArgs": { # Extra arguments for this worker pool.
1681 "a_key": "", # Properties of the object. Contains field @type with type URL.
1682 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001683 "packages": [ # Packages to be installed on workers.
1684 { # The packages that must be installed in order for a worker to run the
1685 # steps of the Cloud Dataflow job that will be assigned to its worker
1686 # pool.
1687 #
1688 # This is the mechanism by which the Cloud Dataflow SDK causes code to
1689 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
1690 # might use this to install jars containing the user's code and all of the
1691 # various dependencies (libraries, data files, etc.) required in order
1692 # for that code to run.
1693 "location": "A String", # The resource to read the package from. The supported resource type is:
1694 #
1695 # Google Cloud Storage:
1696 #
1697 # storage.googleapis.com/{bucket}
1698 # bucket.storage.googleapis.com/
1699 "name": "A String", # The name of the package.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001700 },
1701 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001702 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
1703 # select a default set of packages which are useful to worker
1704 # harnesses written in a particular language.
1705 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
1706 # are supported.
1707 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001708 # attempt to choose a reasonable default.
1709 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
1710 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
1711 # `TEARDOWN_NEVER`.
1712 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
1713 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
1714 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
1715 # down.
1716 #
1717 # If the workers are not torn down by the service, they will
1718 # continue to run and use Google Compute Engine VM resources in the
1719 # user's project until they are explicitly terminated by the user.
1720 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
1721 # policy except for small, manually supervised test jobs.
1722 #
1723 # If unknown or unspecified, the service will attempt to choose a reasonable
1724 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07001725 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
1726 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001727 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
1728 # execute the job. If zero or unspecified, the service will
1729 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001730 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
1731 # the form "regions/REGION/subnetworks/SUBNETWORK".
1732 "dataDisks": [ # Data disks that are used by a VM in this workflow.
1733 { # Describes the data disk used by a workflow job.
1734 "mountPoint": "A String", # Directory in a VM where disk is mounted.
1735 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
1736 # attempt to choose a reasonable default.
1737 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
1738 # must be a disk type appropriate to the project and zone in which
1739 # the workers will run. If unknown or unspecified, the service
1740 # will attempt to choose a reasonable default.
1741 #
1742 # For example, the standard persistent disk type is a resource name
1743 # typically ending in "pd-standard". If SSD persistent disks are
1744 # available, the resource name typically ends with "pd-ssd". The
1745 # actual valid values are defined the Google Compute Engine API,
1746 # not by the Cloud Dataflow API; consult the Google Compute Engine
1747 # documentation for more information about determining the set of
1748 # available disk types for a particular project and zone.
1749 #
1750 # Google Compute Engine Disk types are local to a particular
1751 # project in a particular zone, and so the resource name will
1752 # typically look something like this:
1753 #
1754 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001755 },
1756 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001757 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
1758 # only be set in the Fn API path. For non-cross-language pipelines this
1759 # should have only one entry. Cross-language pipelines will have two or more
1760 # entries.
1761 { # Defines a SDK harness container for executing Dataflow pipelines.
1762 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
1763 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
1764 # container instance with this image. If false (or unset) recommends using
1765 # more than one core per SDK container instance with this image for
1766 # efficiency. Note that Dataflow service may choose to override this property
1767 # if needed.
1768 },
1769 ],
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001770 },
1771 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07001772 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
1773 # unspecified, the service will attempt to choose a reasonable
1774 # default. This should be in the form of the API service name,
1775 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001776 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
1777 # storage. The system will append the suffix "/temp-{JOBNAME} to
1778 # this resource prefix, where {JOBNAME} is the value of the
1779 # job_name field. The resulting bucket and object prefix is used
1780 # as the prefix of the resources used to store temporary data
1781 # needed during the job execution. NOTE: This will override the
1782 # value in taskrunner_settings.
1783 # The supported resource type is:
1784 #
1785 # Google Cloud Storage:
1786 #
1787 # storage.googleapis.com/{bucket}/{object}
1788 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001789 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001790 "location": "A String", # The [regional endpoint]
1791 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1792 # contains this job.
1793 "tempFiles": [ # A set of files the system should be aware of that are used
1794 # for temporary storage. These temporary files will be
1795 # removed on job completion.
1796 # No duplicates are allowed.
1797 # No file patterns are supported.
1798 #
1799 # The supported files are:
1800 #
1801 # Google Cloud Storage:
1802 #
1803 # storage.googleapis.com/{bucket}/{object}
1804 # bucket.storage.googleapis.com/{object}
1805 "A String",
1806 ],
1807 "type": "A String", # The type of Cloud Dataflow job.
1808 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
1809 # If this field is set, the service will ensure its uniqueness.
1810 # The request to create a job will fail if the service has knowledge of a
1811 # previously submitted job with the same client's ID and job name.
1812 # The caller may use this field to ensure idempotence of job
1813 # creation across retried attempts to create a job.
1814 # By default, the field is empty and, in that case, the service ignores it.
1815 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
1816 # snapshot.
1817 "stepsLocation": "A String", # The GCS location where the steps are stored.
1818 "currentStateTime": "A String", # The timestamp associated with the current state.
1819 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
1820 # Flexible resource scheduling jobs are started with some delay after job
1821 # creation, so start_time is unset before start and is updated when the
1822 # job is started by the Cloud Dataflow service. For other jobs, start_time
1823 # always equals to create_time and is immutable and set by the Cloud Dataflow
1824 # service.
1825 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
1826 # Cloud Dataflow service.
1827 "requestedState": "A String", # The job's requested state.
1828 #
1829 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
1830 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
1831 # also be used to directly set a job's requested state to
1832 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
1833 # job if it has not already reached a terminal state.
1834 "name": "A String", # The user-specified Cloud Dataflow job name.
1835 #
1836 # Only one Job with a given name may exist in a project at any
1837 # given time. If a caller attempts to create a Job with the same
1838 # name as an already-existing Job, the attempt returns the
1839 # existing Job.
1840 #
1841 # The name must match the regular expression
1842 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
1843 "steps": [ # Exactly one of step or steps_location should be specified.
1844 #
1845 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001846 { # Defines a particular step within a Cloud Dataflow job.
1847 #
1848 # A job consists of multiple steps, each of which performs some
1849 # specific operation as part of the overall job. Data is typically
1850 # passed from one step to another as part of the job.
1851 #
1852 # Here's an example of a sequence of steps which together implement a
1853 # Map-Reduce job:
1854 #
1855 # * Read a collection of data from some source, parsing the
1856 # collection's elements.
1857 #
1858 # * Validate the elements.
1859 #
1860 # * Apply a user-defined function to map each element to some value
1861 # and extract an element-specific key value.
1862 #
1863 # * Group elements with the same key into a single element with
1864 # that key, transforming a multiply-keyed collection into a
1865 # uniquely-keyed collection.
1866 #
1867 # * Write the elements out to some data sink.
1868 #
1869 # Note that the Cloud Dataflow service may be used to run many different
1870 # types of jobs, not just Map-Reduce.
1871 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07001872 "name": "A String", # The name that identifies the step. This must be unique for each
1873 # step with respect to all other steps in the Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001874 "properties": { # Named properties associated with the step. Each kind of
1875 # predefined step has its own required set of properties.
1876 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001877 "a_key": "", # Properties of the object.
1878 },
1879 },
1880 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07001881 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
1882 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001883 #
Thomas Coffee2f245372017-03-27 10:39:26 -07001884 # When sending a `CreateJobRequest`, you can update a job by specifying it
1885 # here. The job named here is stopped, and its intermediate state is
1886 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001887 "currentState": "A String", # The current state of the job.
1888 #
1889 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
1890 # specified.
1891 #
1892 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
1893 # terminal state. After a job has reached a terminal state, no
1894 # further state updates may be made.
1895 #
1896 # This field may be mutated by the Cloud Dataflow service;
1897 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001898 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
1899 # isn't contained in the submitted job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001900 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001901 "a_key": { # Contains information about how a particular
1902 # google.dataflow.v1beta3.Step will be executed.
1903 "stepName": [ # The steps associated with the execution stage.
1904 # Note that stages may have several steps, and that a given step
1905 # might be run by more than one stage.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001906 "A String",
1907 ],
1908 },
1909 },
1910 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001911 }</pre>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001912</div>
1913
1914<div class="method">
1915 <code class="details" id="getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</code>
1916 <pre>Request the job status.
1917
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001918To request the status of a job, we recommend using
1919`projects.locations.jobs.getMetrics` with a [regional endpoint]
1920(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
1921`projects.jobs.getMetrics` is not recommended, as you can only request the
1922status of jobs that are running in `us-central1`.
1923
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001924Args:
1925 projectId: string, A project id. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001926 location: string, The [regional endpoint]
1927(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
1928contains the job specified by job_id. (required)
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001929 jobId: string, The job to get messages for. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001930 startTime: string, Return only metric data that has changed since this time.
1931Default is to return all information about all metrics for the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001932 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001933 Allowed values
1934 1 - v1 error format
1935 2 - v2 error format
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001936
1937Returns:
1938 An object of the form:
1939
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001940 { # JobMetrics contains a collection of metrics describing the detailed progress
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001941 # of a Dataflow job. Metrics correspond to user-defined and system-defined
1942 # metrics in the job.
1943 #
1944 # This resource captures only the most recent values of each metric;
1945 # time-series data can be queried for them (under the same metric names)
1946 # from Cloud Monitoring.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001947 "metrics": [ # All metrics for this job.
1948 { # Describes the state of a metric.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001949 "meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
1950 # This holds the count of the aggregated values and is used in combination
1951 # with mean_sum above to obtain the actual mean aggregate value.
1952 # The only possible value type is Long.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001953 "kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are
1954 # "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".
1955 # The specified aggregation kind is case-insensitive.
1956 #
1957 # If omitted, this is not an aggregated value but instead
1958 # a single metric sample value.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001959 "set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only
1960 # possible value type is a list of Values whose type can be Long, Double,
1961 # or String, according to the metric's type. All Values in the list must
1962 # be of the same type.
1963 "name": { # Identifies a metric, by describing the source which generated the # Name of the metric.
1964 # metric.
1965 "origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;
1966 # will be "dataflow" for metrics defined by the Dataflow service or SDK.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001967 "name": "A String", # Worker-defined metric name.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001968 "context": { # Zero or more labeled fields which identify the part of the job this
1969 # metric is associated with, such as the name of a step or collection.
1970 #
1971 # For example, built-in counters associated with steps will have
Dan O'Mearadd494642020-05-01 07:42:23 -07001972 # context['step'] = &lt;step-name&gt;. Counters associated with PCollections
1973 # in the SDK will have context['pcollection'] = &lt;pcollection-name&gt;.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001974 "a_key": "A String",
1975 },
1976 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04001977 "meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.
1978 # This holds the sum of the aggregated values and is used in combination
1979 # with mean_count below to obtain the actual mean aggregate value.
1980 # The only possible value types are Long and Double.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001981 "cumulative": True or False, # True if this metric is reported as the total cumulative aggregate
1982 # value accumulated since the worker started working on this WorkItem.
1983 # By default this is false, indicating that this metric is reported
1984 # as a delta that is not associated with any WorkItem.
1985 "updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are
1986 # reporting work progress; it will be filled in responses from the
1987 # metrics API.
1988 "scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",
1989 # "And", and "Or". The possible value types are Long, Double, and Boolean.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04001990 "internal": "", # Worker-computed aggregate value for internal use by the Dataflow
1991 # service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07001992 "gauge": "", # A struct value describing properties of a Gauge.
1993 # Metrics of gauge type show the value of a metric across time, and is
1994 # aggregated based on the newest value.
1995 "distribution": "", # A struct value describing properties of a distribution of numeric values.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08001996 },
1997 ],
1998 "metricTime": "A String", # Timestamp as of which metric values are current.
1999 }</pre>
2000</div>
2001
2002<div class="method">
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002003 <code class="details" id="list">list(projectId, location, pageSize=None, pageToken=None, x__xgafv=None, filter=None, view=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002004 <pre>List the jobs of a project.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002005
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002006To list the jobs of a project in a region, we recommend using
2007`projects.locations.jobs.get` with a [regional endpoint]
2008(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To
2009list the all jobs across all regions, use `projects.jobs.aggregated`. Using
2010`projects.jobs.list` is not recommended, as you can only get the list of
2011jobs that are running in `us-central1`.
2012
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002013Args:
2014 projectId: string, The project which owns the jobs. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002015 location: string, The [regional endpoint]
2016(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2017contains this job. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002018 pageSize: integer, If there are many jobs, limit response to at most this many.
2019The actual number of jobs returned will be the lesser of max_responses
2020and an unspecified server-defined limit.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002021 pageToken: string, Set this to the 'next_page_token' field of a previous response
2022to request additional results in a long list.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002023 x__xgafv: string, V1 error format.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002024 Allowed values
2025 1 - v1 error format
2026 2 - v2 error format
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002027 filter: string, The kind of filter to use.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002028 view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002029
2030Returns:
2031 An object of the form:
2032
Dan O'Mearadd494642020-05-01 07:42:23 -07002033 { # Response to a request to list Cloud Dataflow jobs in a project. This might
2034 # be a partial response, depending on the page size in the ListJobsRequest.
2035 # However, if the project does not have any jobs, an instance of
2036 # ListJobsResponse is not returned and the requests's response
2037 # body is empty {}.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002038 "nextPageToken": "A String", # Set if there may be more results than fit in this response.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002039 "failedLocation": [ # Zero or more messages describing the [regional endpoints]
2040 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2041 # failed to respond.
2042 { # Indicates which [regional endpoint]
2043 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed
2044 # to respond to a request for data.
2045 "name": "A String", # The name of the [regional endpoint]
2046 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2047 # failed to respond.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002048 },
2049 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002050 "jobs": [ # A subset of the requested job information.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002051 { # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002052 "labels": { # User-defined labels for this job.
2053 #
2054 # The labels map can contain no more than 64 entries. Entries of the labels
2055 # map are UTF8 strings that comply with the following restrictions:
2056 #
2057 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2058 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07002059 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002060 # size.
2061 "a_key": "A String",
2062 },
2063 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2064 # by the metadata values provided here. Populated for ListJobs and all GetJob
2065 # views SUMMARY and higher.
2066 # ListJob response and Job SUMMARY view.
2067 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
2068 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
2069 "version": "A String", # The version of the SDK used to run the job.
2070 "sdkSupportStatus": "A String", # The support status for this SDK version.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002071 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002072 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
2073 { # Metadata for a PubSub connector used by the job.
2074 "topic": "A String", # Topic accessed in the connection.
2075 "subscription": "A String", # Subscription used in the connection.
2076 },
2077 ],
2078 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2079 { # Metadata for a Datastore connector used by the job.
2080 "projectId": "A String", # ProjectId accessed in the connection.
2081 "namespace": "A String", # Namespace used in the connection.
2082 },
2083 ],
2084 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2085 { # Metadata for a File connector used by the job.
2086 "filePattern": "A String", # File Pattern used to access files by the connector.
2087 },
2088 ],
2089 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2090 { # Metadata for a Spanner connector used by the job.
2091 "instanceId": "A String", # InstanceId accessed in the connection.
2092 "projectId": "A String", # ProjectId accessed in the connection.
2093 "databaseId": "A String", # DatabaseId accessed in the connection.
2094 },
2095 ],
2096 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2097 { # Metadata for a BigTable connector used by the job.
2098 "instanceId": "A String", # InstanceId accessed in the connection.
2099 "projectId": "A String", # ProjectId accessed in the connection.
2100 "tableId": "A String", # TableId accessed in the connection.
2101 },
2102 ],
2103 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2104 { # Metadata for a BigQuery connector used by the job.
2105 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002106 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07002107 "table": "A String", # Table accessed in the connection.
2108 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002109 },
2110 ],
2111 },
2112 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2113 # A description of the user pipeline and stages through which it is executed.
2114 # Created by Cloud Dataflow service. Only retrieved with
2115 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2116 # form. This data is provided by the Dataflow service for ease of visualizing
2117 # the pipeline and interpreting Dataflow provided metrics.
2118 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2119 { # Description of the type, names/ids, and input/outputs for a transform.
2120 "kind": "A String", # Type of transform.
2121 "name": "A String", # User provided name for this transform instance.
2122 "inputCollectionName": [ # User names for all collection inputs to this transform.
2123 "A String",
2124 ],
2125 "displayData": [ # Transform-specific display data.
2126 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002127 "key": "A String", # The key identifying the display data.
2128 # This is intended to be used as a label for the display data
2129 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002130 "shortStrValue": "A String", # A possible additional shorter value to display.
2131 # For example a java_class_name_value of com.mypackage.MyDoFn
2132 # will be stored with MyDoFn as the short_str_value and
2133 # com.mypackage.MyDoFn as the java_class_name value.
2134 # short_str_value can be displayed and java_class_name_value
2135 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002136 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002137 "url": "A String", # An optional full URL.
2138 "floatValue": 3.14, # Contains value if the data is of float type.
2139 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2140 # language namespace (i.e. python module) which defines the display data.
2141 # This allows a dax monitoring system to specially handle the data
2142 # and perform custom rendering.
2143 "javaClassValue": "A String", # Contains value if the data is of java class type.
2144 "label": "A String", # An optional label to display in a dax UI for the element.
2145 "boolValue": True or False, # Contains value if the data is of a boolean type.
2146 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002147 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002148 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002149 },
2150 ],
2151 "outputCollectionName": [ # User names for all collection outputs to this transform.
2152 "A String",
2153 ],
2154 "id": "A String", # SDK generated id of this transform instance.
2155 },
2156 ],
2157 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2158 { # Description of the composing transforms, names/ids, and input/outputs of a
2159 # stage of execution. Some composing transforms and sources may have been
2160 # generated by the Dataflow service during execution planning.
2161 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2162 { # Description of an interstitial value between transforms in an execution
2163 # stage.
2164 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2165 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2166 # source is most closely associated.
2167 "name": "A String", # Dataflow service generated name for this source.
2168 },
2169 ],
2170 "kind": "A String", # Type of tranform this stage is executing.
2171 "name": "A String", # Dataflow service generated name for this stage.
2172 "outputSource": [ # Output sources for this stage.
2173 { # Description of an input or output of an execution stage.
2174 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2175 "sizeBytes": "A String", # Size of the source, if measurable.
2176 "name": "A String", # Dataflow service generated name for this source.
2177 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2178 # source is most closely associated.
2179 },
2180 ],
2181 "inputSource": [ # Input sources for this stage.
2182 { # Description of an input or output of an execution stage.
2183 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2184 "sizeBytes": "A String", # Size of the source, if measurable.
2185 "name": "A String", # Dataflow service generated name for this source.
2186 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2187 # source is most closely associated.
2188 },
2189 ],
2190 "componentTransform": [ # Transforms that comprise this execution stage.
2191 { # Description of a transform executed as part of an execution stage.
2192 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2193 "originalTransform": "A String", # User name for the original user transform with which this transform is
2194 # most closely associated.
2195 "name": "A String", # Dataflow service generated name for this source.
2196 },
2197 ],
2198 "id": "A String", # Dataflow service generated id for this stage.
2199 },
2200 ],
2201 "displayData": [ # Pipeline level display data.
2202 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002203 "key": "A String", # The key identifying the display data.
2204 # This is intended to be used as a label for the display data
2205 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002206 "shortStrValue": "A String", # A possible additional shorter value to display.
2207 # For example a java_class_name_value of com.mypackage.MyDoFn
2208 # will be stored with MyDoFn as the short_str_value and
2209 # com.mypackage.MyDoFn as the java_class_name value.
2210 # short_str_value can be displayed and java_class_name_value
2211 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002212 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002213 "url": "A String", # An optional full URL.
2214 "floatValue": 3.14, # Contains value if the data is of float type.
2215 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2216 # language namespace (i.e. python module) which defines the display data.
2217 # This allows a dax monitoring system to specially handle the data
2218 # and perform custom rendering.
2219 "javaClassValue": "A String", # Contains value if the data is of java class type.
2220 "label": "A String", # An optional label to display in a dax UI for the element.
2221 "boolValue": True or False, # Contains value if the data is of a boolean type.
2222 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002223 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002224 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002225 },
2226 ],
2227 },
2228 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2229 # callers cannot mutate it.
2230 { # A message describing the state of a particular execution stage.
2231 "executionStageName": "A String", # The name of the execution stage.
2232 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2233 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002234 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002235 ],
2236 "id": "A String", # The unique ID of this job.
2237 #
2238 # This field is set by the Cloud Dataflow service when the Job is
2239 # created, and is immutable for the life of the job.
2240 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2241 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2242 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2243 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2244 # corresponding name prefixes of the new job.
2245 "a_key": "A String",
2246 },
2247 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002248 "workerRegion": "A String", # The Compute Engine region
2249 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2250 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
2251 # with worker_zone. If neither worker_region nor worker_zone is specified,
2252 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002253 "version": { # A structure describing which components and their versions of the service
2254 # are required in order to run the job.
2255 "a_key": "", # Properties of the object.
2256 },
2257 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2258 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2259 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002260 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002261 # Format:
2262 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2263 "internalExperiments": { # Experimental settings.
2264 "a_key": "", # Properties of the object. Contains field @type with type URL.
2265 },
2266 "dataset": "A String", # The dataset for the current project where various workflow
2267 # related tables are stored.
2268 #
2269 # The supported resource type is:
2270 #
2271 # Google BigQuery:
2272 # bigquery.googleapis.com/{dataset}
2273 "experiments": [ # The list of experiments to enable.
2274 "A String",
2275 ],
2276 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2277 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2278 # options are passed through the service and are used to recreate the
2279 # SDK pipeline options on the worker in a language agnostic and platform
2280 # independent way.
2281 "a_key": "", # Properties of the object.
2282 },
2283 "userAgent": { # A description of the process that generated the request.
2284 "a_key": "", # Properties of the object.
2285 },
Dan O'Mearadd494642020-05-01 07:42:23 -07002286 "workerZone": "A String", # The Compute Engine zone
2287 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2288 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
2289 # with worker_region. If neither worker_region nor worker_zone is specified,
2290 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002291 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2292 # specified in order for the job to have workers.
2293 { # Describes one particular pool of Cloud Dataflow workers to be
2294 # instantiated by the Cloud Dataflow service in order to perform the
2295 # computations required by a job. Note that a workflow job may use
2296 # multiple pools, in order to match the various computational
2297 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002298 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2299 # harness, residing in Google Container Registry.
2300 #
2301 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2302 "ipConfiguration": "A String", # Configuration for VM IPs.
2303 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2304 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2305 "algorithm": "A String", # The algorithm to use for autoscaling.
2306 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002307 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07002308 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2309 # the service will use the network "default".
2310 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2311 # will attempt to choose a reasonable default.
2312 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2313 "a_key": "A String",
2314 },
2315 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2316 # service will attempt to choose a reasonable default.
2317 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2318 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002319 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2320 # using the standard Dataflow task runner. Users should ignore
2321 # this field.
2322 "workflowFileName": "A String", # The file to store the workflow in.
2323 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2324 # will not be uploaded.
2325 #
2326 # The supported resource type is:
2327 #
2328 # Google Cloud Storage:
2329 # storage.googleapis.com/{bucket}/{object}
2330 # bucket.storage.googleapis.com/{object}
2331 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07002332 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2333 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2334 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2335 "vmId": "A String", # The ID string of the VM.
2336 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2337 # taskrunner; e.g. "wheel".
2338 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
2339 # taskrunner; e.g. "root".
2340 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
2341 # access the Cloud Dataflow API.
2342 "A String",
2343 ],
2344 "languageHint": "A String", # The suggested backend language.
2345 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
2346 # console.
2347 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
2348 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002349 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
2350 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
2351 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
2352 # "shuffle/v1beta1".
2353 "workerId": "A String", # The ID of the worker running this pipeline.
2354 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002355 #
2356 # When workers access Google Cloud APIs, they logically do so via
2357 # relative URLs. If this field is specified, it supplies the base
2358 # URL to use for resolving these relative URLs. The normative
2359 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2360 # Locators".
2361 #
2362 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002363 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
2364 # "dataflow/v1b3/projects".
2365 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2366 # storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002367 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07002368 # The supported resource type is:
2369 #
2370 # Google Cloud Storage:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002371 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07002372 # storage.googleapis.com/{bucket}/{object}
2373 # bucket.storage.googleapis.com/{object}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002374 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002375 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
2376 "harnessCommand": "A String", # The command to launch the worker harness.
2377 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
2378 # temporary storage.
2379 #
2380 # The supported resource type is:
2381 #
2382 # Google Cloud Storage:
2383 # storage.googleapis.com/{bucket}/{object}
2384 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07002385 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
2386 #
2387 # When workers access Google Cloud APIs, they logically do so via
2388 # relative URLs. If this field is specified, it supplies the base
2389 # URL to use for resolving these relative URLs. The normative
2390 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
2391 # Locators".
2392 #
2393 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002394 },
Dan O'Mearadd494642020-05-01 07:42:23 -07002395 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
2396 # service will choose a number of threads (according to the number of cores
2397 # on the selected machine type for batch, or 1 by convention for streaming).
2398 "poolArgs": { # Extra arguments for this worker pool.
2399 "a_key": "", # Properties of the object. Contains field @type with type URL.
2400 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002401 "packages": [ # Packages to be installed on workers.
2402 { # The packages that must be installed in order for a worker to run the
2403 # steps of the Cloud Dataflow job that will be assigned to its worker
2404 # pool.
2405 #
2406 # This is the mechanism by which the Cloud Dataflow SDK causes code to
2407 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
2408 # might use this to install jars containing the user's code and all of the
2409 # various dependencies (libraries, data files, etc.) required in order
2410 # for that code to run.
2411 "location": "A String", # The resource to read the package from. The supported resource type is:
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002412 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002413 # Google Cloud Storage:
2414 #
2415 # storage.googleapis.com/{bucket}
2416 # bucket.storage.googleapis.com/
2417 "name": "A String", # The name of the package.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002418 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002419 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07002420 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
2421 # select a default set of packages which are useful to worker
2422 # harnesses written in a particular language.
2423 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
2424 # are supported.
2425 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002426 # attempt to choose a reasonable default.
2427 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
2428 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
2429 # `TEARDOWN_NEVER`.
2430 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
2431 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
2432 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
2433 # down.
2434 #
2435 # If the workers are not torn down by the service, they will
2436 # continue to run and use Google Compute Engine VM resources in the
2437 # user's project until they are explicitly terminated by the user.
2438 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
2439 # policy except for small, manually supervised test jobs.
2440 #
2441 # If unknown or unspecified, the service will attempt to choose a reasonable
2442 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07002443 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
2444 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002445 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
2446 # execute the job. If zero or unspecified, the service will
2447 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002448 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
2449 # the form "regions/REGION/subnetworks/SUBNETWORK".
2450 "dataDisks": [ # Data disks that are used by a VM in this workflow.
2451 { # Describes the data disk used by a workflow job.
2452 "mountPoint": "A String", # Directory in a VM where disk is mounted.
2453 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
2454 # attempt to choose a reasonable default.
2455 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
2456 # must be a disk type appropriate to the project and zone in which
2457 # the workers will run. If unknown or unspecified, the service
2458 # will attempt to choose a reasonable default.
2459 #
2460 # For example, the standard persistent disk type is a resource name
2461 # typically ending in "pd-standard". If SSD persistent disks are
2462 # available, the resource name typically ends with "pd-ssd". The
2463 # actual valid values are defined the Google Compute Engine API,
2464 # not by the Cloud Dataflow API; consult the Google Compute Engine
2465 # documentation for more information about determining the set of
2466 # available disk types for a particular project and zone.
2467 #
2468 # Google Compute Engine Disk types are local to a particular
2469 # project in a particular zone, and so the resource name will
2470 # typically look something like this:
2471 #
2472 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04002473 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002474 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07002475 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
2476 # only be set in the Fn API path. For non-cross-language pipelines this
2477 # should have only one entry. Cross-language pipelines will have two or more
2478 # entries.
2479 { # Defines a SDK harness container for executing Dataflow pipelines.
2480 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
2481 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
2482 # container instance with this image. If false (or unset) recommends using
2483 # more than one core per SDK container instance with this image for
2484 # efficiency. Note that Dataflow service may choose to override this property
2485 # if needed.
2486 },
2487 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002488 },
2489 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07002490 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
2491 # unspecified, the service will attempt to choose a reasonable
2492 # default. This should be in the form of the API service name,
2493 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002494 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
2495 # storage. The system will append the suffix "/temp-{JOBNAME} to
2496 # this resource prefix, where {JOBNAME} is the value of the
2497 # job_name field. The resulting bucket and object prefix is used
2498 # as the prefix of the resources used to store temporary data
2499 # needed during the job execution. NOTE: This will override the
2500 # value in taskrunner_settings.
2501 # The supported resource type is:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002502 #
2503 # Google Cloud Storage:
2504 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002505 # storage.googleapis.com/{bucket}/{object}
2506 # bucket.storage.googleapis.com/{object}
2507 },
2508 "location": "A String", # The [regional endpoint]
2509 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2510 # contains this job.
2511 "tempFiles": [ # A set of files the system should be aware of that are used
2512 # for temporary storage. These temporary files will be
2513 # removed on job completion.
2514 # No duplicates are allowed.
2515 # No file patterns are supported.
2516 #
2517 # The supported files are:
2518 #
2519 # Google Cloud Storage:
2520 #
2521 # storage.googleapis.com/{bucket}/{object}
2522 # bucket.storage.googleapis.com/{object}
2523 "A String",
2524 ],
2525 "type": "A String", # The type of Cloud Dataflow job.
2526 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
2527 # If this field is set, the service will ensure its uniqueness.
2528 # The request to create a job will fail if the service has knowledge of a
2529 # previously submitted job with the same client's ID and job name.
2530 # The caller may use this field to ensure idempotence of job
2531 # creation across retried attempts to create a job.
2532 # By default, the field is empty and, in that case, the service ignores it.
2533 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
2534 # snapshot.
2535 "stepsLocation": "A String", # The GCS location where the steps are stored.
2536 "currentStateTime": "A String", # The timestamp associated with the current state.
2537 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
2538 # Flexible resource scheduling jobs are started with some delay after job
2539 # creation, so start_time is unset before start and is updated when the
2540 # job is started by the Cloud Dataflow service. For other jobs, start_time
2541 # always equals to create_time and is immutable and set by the Cloud Dataflow
2542 # service.
2543 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
2544 # Cloud Dataflow service.
2545 "requestedState": "A String", # The job's requested state.
2546 #
2547 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
2548 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
2549 # also be used to directly set a job's requested state to
2550 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
2551 # job if it has not already reached a terminal state.
2552 "name": "A String", # The user-specified Cloud Dataflow job name.
2553 #
2554 # Only one Job with a given name may exist in a project at any
2555 # given time. If a caller attempts to create a Job with the same
2556 # name as an already-existing Job, the attempt returns the
2557 # existing Job.
2558 #
2559 # The name must match the regular expression
2560 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
2561 "steps": [ # Exactly one of step or steps_location should be specified.
2562 #
2563 # The top-level steps that constitute the entire job.
2564 { # Defines a particular step within a Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002565 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002566 # A job consists of multiple steps, each of which performs some
2567 # specific operation as part of the overall job. Data is typically
2568 # passed from one step to another as part of the job.
2569 #
2570 # Here's an example of a sequence of steps which together implement a
2571 # Map-Reduce job:
2572 #
2573 # * Read a collection of data from some source, parsing the
2574 # collection's elements.
2575 #
2576 # * Validate the elements.
2577 #
2578 # * Apply a user-defined function to map each element to some value
2579 # and extract an element-specific key value.
2580 #
2581 # * Group elements with the same key into a single element with
2582 # that key, transforming a multiply-keyed collection into a
2583 # uniquely-keyed collection.
2584 #
2585 # * Write the elements out to some data sink.
2586 #
2587 # Note that the Cloud Dataflow service may be used to run many different
2588 # types of jobs, not just Map-Reduce.
2589 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002590 "name": "A String", # The name that identifies the step. This must be unique for each
2591 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002592 "properties": { # Named properties associated with the step. Each kind of
2593 # predefined step has its own required set of properties.
2594 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
2595 "a_key": "", # Properties of the object.
2596 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002597 },
2598 ],
2599 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
2600 # of the job it replaced.
2601 #
2602 # When sending a `CreateJobRequest`, you can update a job by specifying it
2603 # here. The job named here is stopped, and its intermediate state is
2604 # transferred to this job.
2605 "currentState": "A String", # The current state of the job.
2606 #
2607 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
2608 # specified.
2609 #
2610 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
2611 # terminal state. After a job has reached a terminal state, no
2612 # further state updates may be made.
2613 #
2614 # This field may be mutated by the Cloud Dataflow service;
2615 # callers cannot mutate it.
2616 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
2617 # isn't contained in the submitted job.
2618 "stages": { # A mapping from each stage to the information about that stage.
2619 "a_key": { # Contains information about how a particular
2620 # google.dataflow.v1beta3.Step will be executed.
2621 "stepName": [ # The steps associated with the execution stage.
2622 # Note that stages may have several steps, and that a given step
2623 # might be run by more than one stage.
2624 "A String",
2625 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002626 },
2627 },
2628 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002629 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002630 ],
2631 }</pre>
2632</div>
2633
2634<div class="method">
2635 <code class="details" id="list_next">list_next(previous_request, previous_response)</code>
2636 <pre>Retrieves the next page of results.
2637
2638Args:
2639 previous_request: The request for the previous page. (required)
2640 previous_response: The response from the request for the previous page. (required)
2641
2642Returns:
2643 A request object that you can call 'execute()' on to request the next
2644 page. Returns None if there are no more items in the collection.
2645 </pre>
2646</div>
2647
2648<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07002649 <code class="details" id="snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</code>
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002650 <pre>Snapshot the state of a streaming job.
2651
2652Args:
2653 projectId: string, The project which owns the job to be snapshotted. (required)
2654 location: string, The location that contains this job. (required)
2655 jobId: string, The job to be snapshotted. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07002656 body: object, The request body.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002657 The object takes the form of:
2658
2659{ # Request to create a snapshot of a job.
2660 "location": "A String", # The location that contains this job.
2661 "ttl": "A String", # TTL for the snapshot.
Dan O'Mearadd494642020-05-01 07:42:23 -07002662 "description": "A String", # User specified description of the snapshot. Maybe empty.
2663 "snapshotSources": True or False, # If true, perform snapshots for sources which support this.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002664 }
2665
2666 x__xgafv: string, V1 error format.
2667 Allowed values
2668 1 - v1 error format
2669 2 - v2 error format
2670
2671Returns:
2672 An object of the form:
2673
2674 { # Represents a snapshot of a job.
2675 "sourceJobId": "A String", # The job this snapshot was created from.
Dan O'Mearadd494642020-05-01 07:42:23 -07002676 "diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY
2677 # state.
2678 "description": "A String", # User specified description of the snapshot. Maybe empty.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002679 "projectId": "A String", # The project this snapshot belongs to.
2680 "creationTime": "A String", # The time this snapshot was created.
2681 "state": "A String", # State of the snapshot.
2682 "ttl": "A String", # The time after which this snapshot will be automatically deleted.
Dan O'Mearadd494642020-05-01 07:42:23 -07002683 "pubsubMetadata": [ # PubSub snapshot metadata.
2684 { # Represents a Pubsub snapshot.
2685 "expireTime": "A String", # The expire time of the Pubsub snapshot.
2686 "snapshotName": "A String", # The name of the Pubsub snapshot.
2687 "topicName": "A String", # The name of the Pubsub topic.
2688 },
2689 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002690 "id": "A String", # The unique ID of this snapshot.
2691 }</pre>
2692</div>
2693
2694<div class="method">
Dan O'Mearadd494642020-05-01 07:42:23 -07002695 <code class="details" id="update">update(projectId, location, jobId, body=None, x__xgafv=None)</code>
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002696 <pre>Updates the state of an existing Cloud Dataflow job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002697
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002698To update the state of an existing job, we recommend using
2699`projects.locations.jobs.update` with a [regional endpoint]
2700(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using
2701`projects.jobs.update` is not recommended, as you can only update the state
2702of jobs that are running in `us-central1`.
2703
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002704Args:
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002705 projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002706 location: string, The [regional endpoint]
2707(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
2708contains this job. (required)
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002709 jobId: string, The job ID. (required)
Dan O'Mearadd494642020-05-01 07:42:23 -07002710 body: object, The request body.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08002711 The object takes the form of:
2712
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04002713{ # Defines a job to be run by the Cloud Dataflow service.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002714 "labels": { # User-defined labels for this job.
2715 #
2716 # The labels map can contain no more than 64 entries. Entries of the labels
2717 # map are UTF8 strings that comply with the following restrictions:
2718 #
2719 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
2720 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07002721 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002722 # size.
2723 "a_key": "A String",
2724 },
2725 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
2726 # by the metadata values provided here. Populated for ListJobs and all GetJob
2727 # views SUMMARY and higher.
2728 # ListJob response and Job SUMMARY view.
2729 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
2730 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
2731 "version": "A String", # The version of the SDK used to run the job.
2732 "sdkSupportStatus": "A String", # The support status for this SDK version.
2733 },
2734 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
2735 { # Metadata for a PubSub connector used by the job.
2736 "topic": "A String", # Topic accessed in the connection.
2737 "subscription": "A String", # Subscription used in the connection.
2738 },
2739 ],
2740 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
2741 { # Metadata for a Datastore connector used by the job.
2742 "projectId": "A String", # ProjectId accessed in the connection.
2743 "namespace": "A String", # Namespace used in the connection.
2744 },
2745 ],
2746 "fileDetails": [ # Identification of a File source used in the Dataflow job.
2747 { # Metadata for a File connector used by the job.
2748 "filePattern": "A String", # File Pattern used to access files by the connector.
2749 },
2750 ],
2751 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
2752 { # Metadata for a Spanner connector used by the job.
2753 "instanceId": "A String", # InstanceId accessed in the connection.
2754 "projectId": "A String", # ProjectId accessed in the connection.
2755 "databaseId": "A String", # DatabaseId accessed in the connection.
2756 },
2757 ],
2758 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
2759 { # Metadata for a BigTable connector used by the job.
2760 "instanceId": "A String", # InstanceId accessed in the connection.
2761 "projectId": "A String", # ProjectId accessed in the connection.
2762 "tableId": "A String", # TableId accessed in the connection.
2763 },
2764 ],
2765 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
2766 { # Metadata for a BigQuery connector used by the job.
2767 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002768 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07002769 "table": "A String", # Table accessed in the connection.
2770 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002771 },
2772 ],
2773 },
2774 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
2775 # A description of the user pipeline and stages through which it is executed.
2776 # Created by Cloud Dataflow service. Only retrieved with
2777 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
2778 # form. This data is provided by the Dataflow service for ease of visualizing
2779 # the pipeline and interpreting Dataflow provided metrics.
2780 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
2781 { # Description of the type, names/ids, and input/outputs for a transform.
2782 "kind": "A String", # Type of transform.
2783 "name": "A String", # User provided name for this transform instance.
2784 "inputCollectionName": [ # User names for all collection inputs to this transform.
2785 "A String",
2786 ],
2787 "displayData": [ # Transform-specific display data.
2788 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002789 "key": "A String", # The key identifying the display data.
2790 # This is intended to be used as a label for the display data
2791 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002792 "shortStrValue": "A String", # A possible additional shorter value to display.
2793 # For example a java_class_name_value of com.mypackage.MyDoFn
2794 # will be stored with MyDoFn as the short_str_value and
2795 # com.mypackage.MyDoFn as the java_class_name value.
2796 # short_str_value can be displayed and java_class_name_value
2797 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002798 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002799 "url": "A String", # An optional full URL.
2800 "floatValue": 3.14, # Contains value if the data is of float type.
2801 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2802 # language namespace (i.e. python module) which defines the display data.
2803 # This allows a dax monitoring system to specially handle the data
2804 # and perform custom rendering.
2805 "javaClassValue": "A String", # Contains value if the data is of java class type.
2806 "label": "A String", # An optional label to display in a dax UI for the element.
2807 "boolValue": True or False, # Contains value if the data is of a boolean type.
2808 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002809 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002810 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002811 },
2812 ],
2813 "outputCollectionName": [ # User names for all collection outputs to this transform.
2814 "A String",
2815 ],
2816 "id": "A String", # SDK generated id of this transform instance.
2817 },
2818 ],
2819 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
2820 { # Description of the composing transforms, names/ids, and input/outputs of a
2821 # stage of execution. Some composing transforms and sources may have been
2822 # generated by the Dataflow service during execution planning.
2823 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
2824 { # Description of an interstitial value between transforms in an execution
2825 # stage.
2826 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2827 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2828 # source is most closely associated.
2829 "name": "A String", # Dataflow service generated name for this source.
2830 },
2831 ],
2832 "kind": "A String", # Type of tranform this stage is executing.
2833 "name": "A String", # Dataflow service generated name for this stage.
2834 "outputSource": [ # Output sources for this stage.
2835 { # Description of an input or output of an execution stage.
2836 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2837 "sizeBytes": "A String", # Size of the source, if measurable.
2838 "name": "A String", # Dataflow service generated name for this source.
2839 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2840 # source is most closely associated.
2841 },
2842 ],
2843 "inputSource": [ # Input sources for this stage.
2844 { # Description of an input or output of an execution stage.
2845 "userName": "A String", # Human-readable name for this source; may be user or system generated.
2846 "sizeBytes": "A String", # Size of the source, if measurable.
2847 "name": "A String", # Dataflow service generated name for this source.
2848 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
2849 # source is most closely associated.
2850 },
2851 ],
2852 "componentTransform": [ # Transforms that comprise this execution stage.
2853 { # Description of a transform executed as part of an execution stage.
2854 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
2855 "originalTransform": "A String", # User name for the original user transform with which this transform is
2856 # most closely associated.
2857 "name": "A String", # Dataflow service generated name for this source.
2858 },
2859 ],
2860 "id": "A String", # Dataflow service generated id for this stage.
2861 },
2862 ],
2863 "displayData": [ # Pipeline level display data.
2864 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07002865 "key": "A String", # The key identifying the display data.
2866 # This is intended to be used as a label for the display data
2867 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002868 "shortStrValue": "A String", # A possible additional shorter value to display.
2869 # For example a java_class_name_value of com.mypackage.MyDoFn
2870 # will be stored with MyDoFn as the short_str_value and
2871 # com.mypackage.MyDoFn as the java_class_name value.
2872 # short_str_value can be displayed and java_class_name_value
2873 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07002874 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002875 "url": "A String", # An optional full URL.
2876 "floatValue": 3.14, # Contains value if the data is of float type.
2877 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
2878 # language namespace (i.e. python module) which defines the display data.
2879 # This allows a dax monitoring system to specially handle the data
2880 # and perform custom rendering.
2881 "javaClassValue": "A String", # Contains value if the data is of java class type.
2882 "label": "A String", # An optional label to display in a dax UI for the element.
2883 "boolValue": True or False, # Contains value if the data is of a boolean type.
2884 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07002885 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002886 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002887 },
2888 ],
2889 },
2890 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
2891 # callers cannot mutate it.
2892 { # A message describing the state of a particular execution stage.
2893 "executionStageName": "A String", # The name of the execution stage.
2894 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
2895 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
2896 },
2897 ],
2898 "id": "A String", # The unique ID of this job.
2899 #
2900 # This field is set by the Cloud Dataflow service when the Job is
2901 # created, and is immutable for the life of the job.
2902 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
2903 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
2904 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
2905 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
2906 # corresponding name prefixes of the new job.
2907 "a_key": "A String",
2908 },
2909 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002910 "workerRegion": "A String", # The Compute Engine region
2911 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2912 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
2913 # with worker_zone. If neither worker_region nor worker_zone is specified,
2914 # default to the control plane's region.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002915 "version": { # A structure describing which components and their versions of the service
2916 # are required in order to run the job.
2917 "a_key": "", # Properties of the object.
2918 },
2919 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
2920 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
2921 # at rest, AKA a Customer Managed Encryption Key (CMEK).
2922 #
2923 # Format:
2924 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
2925 "internalExperiments": { # Experimental settings.
2926 "a_key": "", # Properties of the object. Contains field @type with type URL.
2927 },
2928 "dataset": "A String", # The dataset for the current project where various workflow
2929 # related tables are stored.
2930 #
2931 # The supported resource type is:
2932 #
2933 # Google BigQuery:
2934 # bigquery.googleapis.com/{dataset}
2935 "experiments": [ # The list of experiments to enable.
2936 "A String",
2937 ],
2938 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
2939 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
2940 # options are passed through the service and are used to recreate the
2941 # SDK pipeline options on the worker in a language agnostic and platform
2942 # independent way.
2943 "a_key": "", # Properties of the object.
2944 },
2945 "userAgent": { # A description of the process that generated the request.
2946 "a_key": "", # Properties of the object.
2947 },
Dan O'Mearadd494642020-05-01 07:42:23 -07002948 "workerZone": "A String", # The Compute Engine zone
2949 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
2950 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
2951 # with worker_region. If neither worker_region nor worker_zone is specified,
2952 # a zone in the control plane's region is chosen based on available capacity.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002953 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
2954 # specified in order for the job to have workers.
2955 { # Describes one particular pool of Cloud Dataflow workers to be
2956 # instantiated by the Cloud Dataflow service in order to perform the
2957 # computations required by a job. Note that a workflow job may use
2958 # multiple pools, in order to match the various computational
2959 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07002960 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
2961 # harness, residing in Google Container Registry.
2962 #
2963 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
2964 "ipConfiguration": "A String", # Configuration for VM IPs.
2965 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
2966 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
2967 "algorithm": "A String", # The algorithm to use for autoscaling.
2968 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002969 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07002970 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
2971 # the service will use the network "default".
2972 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
2973 # will attempt to choose a reasonable default.
2974 "metadata": { # Metadata to set on the Google Compute Engine VMs.
2975 "a_key": "A String",
2976 },
2977 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
2978 # service will attempt to choose a reasonable default.
2979 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
2980 # Compute Engine API.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07002981 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
2982 # using the standard Dataflow task runner. Users should ignore
2983 # this field.
2984 "workflowFileName": "A String", # The file to store the workflow in.
2985 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
2986 # will not be uploaded.
2987 #
2988 # The supported resource type is:
2989 #
2990 # Google Cloud Storage:
2991 # storage.googleapis.com/{bucket}/{object}
2992 # bucket.storage.googleapis.com/{object}
2993 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07002994 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
2995 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
2996 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
2997 "vmId": "A String", # The ID string of the VM.
2998 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
2999 # taskrunner; e.g. "wheel".
3000 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
3001 # taskrunner; e.g. "root".
3002 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
3003 # access the Cloud Dataflow API.
3004 "A String",
3005 ],
3006 "languageHint": "A String", # The suggested backend language.
3007 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3008 # console.
3009 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
3010 "logDir": "A String", # The directory on the VM to store logs.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003011 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
3012 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
3013 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
3014 # "shuffle/v1beta1".
3015 "workerId": "A String", # The ID of the worker running this pipeline.
3016 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
3017 #
3018 # When workers access Google Cloud APIs, they logically do so via
3019 # relative URLs. If this field is specified, it supplies the base
3020 # URL to use for resolving these relative URLs. The normative
3021 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3022 # Locators".
3023 #
3024 # If not specified, the default value is "http://www.googleapis.com/"
3025 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
3026 # "dataflow/v1b3/projects".
3027 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3028 # storage.
3029 #
3030 # The supported resource type is:
3031 #
3032 # Google Cloud Storage:
3033 #
3034 # storage.googleapis.com/{bucket}/{object}
3035 # bucket.storage.googleapis.com/{object}
3036 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003037 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
3038 "harnessCommand": "A String", # The command to launch the worker harness.
3039 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
3040 # temporary storage.
3041 #
3042 # The supported resource type is:
3043 #
3044 # Google Cloud Storage:
3045 # storage.googleapis.com/{bucket}/{object}
3046 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07003047 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3048 #
3049 # When workers access Google Cloud APIs, they logically do so via
3050 # relative URLs. If this field is specified, it supplies the base
3051 # URL to use for resolving these relative URLs. The normative
3052 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3053 # Locators".
3054 #
3055 # If not specified, the default value is "http://www.googleapis.com/"
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003056 },
Dan O'Mearadd494642020-05-01 07:42:23 -07003057 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
3058 # service will choose a number of threads (according to the number of cores
3059 # on the selected machine type for batch, or 1 by convention for streaming).
3060 "poolArgs": { # Extra arguments for this worker pool.
3061 "a_key": "", # Properties of the object. Contains field @type with type URL.
3062 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003063 "packages": [ # Packages to be installed on workers.
3064 { # The packages that must be installed in order for a worker to run the
3065 # steps of the Cloud Dataflow job that will be assigned to its worker
3066 # pool.
3067 #
3068 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3069 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3070 # might use this to install jars containing the user's code and all of the
3071 # various dependencies (libraries, data files, etc.) required in order
3072 # for that code to run.
3073 "location": "A String", # The resource to read the package from. The supported resource type is:
3074 #
3075 # Google Cloud Storage:
3076 #
3077 # storage.googleapis.com/{bucket}
3078 # bucket.storage.googleapis.com/
3079 "name": "A String", # The name of the package.
3080 },
3081 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003082 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3083 # select a default set of packages which are useful to worker
3084 # harnesses written in a particular language.
3085 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
3086 # are supported.
3087 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003088 # attempt to choose a reasonable default.
3089 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
3090 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3091 # `TEARDOWN_NEVER`.
3092 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3093 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3094 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3095 # down.
3096 #
3097 # If the workers are not torn down by the service, they will
3098 # continue to run and use Google Compute Engine VM resources in the
3099 # user's project until they are explicitly terminated by the user.
3100 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3101 # policy except for small, manually supervised test jobs.
3102 #
3103 # If unknown or unspecified, the service will attempt to choose a reasonable
3104 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07003105 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3106 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003107 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
3108 # execute the job. If zero or unspecified, the service will
3109 # attempt to choose a reasonable default.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003110 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3111 # the form "regions/REGION/subnetworks/SUBNETWORK".
3112 "dataDisks": [ # Data disks that are used by a VM in this workflow.
3113 { # Describes the data disk used by a workflow job.
3114 "mountPoint": "A String", # Directory in a VM where disk is mounted.
3115 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
3116 # attempt to choose a reasonable default.
3117 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
3118 # must be a disk type appropriate to the project and zone in which
3119 # the workers will run. If unknown or unspecified, the service
3120 # will attempt to choose a reasonable default.
3121 #
3122 # For example, the standard persistent disk type is a resource name
3123 # typically ending in "pd-standard". If SSD persistent disks are
3124 # available, the resource name typically ends with "pd-ssd". The
3125 # actual valid values are defined the Google Compute Engine API,
3126 # not by the Cloud Dataflow API; consult the Google Compute Engine
3127 # documentation for more information about determining the set of
3128 # available disk types for a particular project and zone.
3129 #
3130 # Google Compute Engine Disk types are local to a particular
3131 # project in a particular zone, and so the resource name will
3132 # typically look something like this:
3133 #
3134 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
3135 },
3136 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003137 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
3138 # only be set in the Fn API path. For non-cross-language pipelines this
3139 # should have only one entry. Cross-language pipelines will have two or more
3140 # entries.
3141 { # Defines a SDK harness container for executing Dataflow pipelines.
3142 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
3143 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
3144 # container instance with this image. If false (or unset) recommends using
3145 # more than one core per SDK container instance with this image for
3146 # efficiency. Note that Dataflow service may choose to override this property
3147 # if needed.
3148 },
3149 ],
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003150 },
3151 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003152 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
3153 # unspecified, the service will attempt to choose a reasonable
3154 # default. This should be in the form of the API service name,
3155 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003156 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3157 # storage. The system will append the suffix "/temp-{JOBNAME} to
3158 # this resource prefix, where {JOBNAME} is the value of the
3159 # job_name field. The resulting bucket and object prefix is used
3160 # as the prefix of the resources used to store temporary data
3161 # needed during the job execution. NOTE: This will override the
3162 # value in taskrunner_settings.
3163 # The supported resource type is:
3164 #
3165 # Google Cloud Storage:
3166 #
3167 # storage.googleapis.com/{bucket}/{object}
3168 # bucket.storage.googleapis.com/{object}
3169 },
3170 "location": "A String", # The [regional endpoint]
3171 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3172 # contains this job.
3173 "tempFiles": [ # A set of files the system should be aware of that are used
3174 # for temporary storage. These temporary files will be
3175 # removed on job completion.
3176 # No duplicates are allowed.
3177 # No file patterns are supported.
3178 #
3179 # The supported files are:
3180 #
3181 # Google Cloud Storage:
3182 #
3183 # storage.googleapis.com/{bucket}/{object}
3184 # bucket.storage.googleapis.com/{object}
3185 "A String",
3186 ],
3187 "type": "A String", # The type of Cloud Dataflow job.
3188 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3189 # If this field is set, the service will ensure its uniqueness.
3190 # The request to create a job will fail if the service has knowledge of a
3191 # previously submitted job with the same client's ID and job name.
3192 # The caller may use this field to ensure idempotence of job
3193 # creation across retried attempts to create a job.
3194 # By default, the field is empty and, in that case, the service ignores it.
3195 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3196 # snapshot.
3197 "stepsLocation": "A String", # The GCS location where the steps are stored.
3198 "currentStateTime": "A String", # The timestamp associated with the current state.
3199 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3200 # Flexible resource scheduling jobs are started with some delay after job
3201 # creation, so start_time is unset before start and is updated when the
3202 # job is started by the Cloud Dataflow service. For other jobs, start_time
3203 # always equals to create_time and is immutable and set by the Cloud Dataflow
3204 # service.
3205 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3206 # Cloud Dataflow service.
3207 "requestedState": "A String", # The job's requested state.
3208 #
3209 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3210 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3211 # also be used to directly set a job's requested state to
3212 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3213 # job if it has not already reached a terminal state.
3214 "name": "A String", # The user-specified Cloud Dataflow job name.
3215 #
3216 # Only one Job with a given name may exist in a project at any
3217 # given time. If a caller attempts to create a Job with the same
3218 # name as an already-existing Job, the attempt returns the
3219 # existing Job.
3220 #
3221 # The name must match the regular expression
3222 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3223 "steps": [ # Exactly one of step or steps_location should be specified.
3224 #
3225 # The top-level steps that constitute the entire job.
3226 { # Defines a particular step within a Cloud Dataflow job.
3227 #
3228 # A job consists of multiple steps, each of which performs some
3229 # specific operation as part of the overall job. Data is typically
3230 # passed from one step to another as part of the job.
3231 #
3232 # Here's an example of a sequence of steps which together implement a
3233 # Map-Reduce job:
3234 #
3235 # * Read a collection of data from some source, parsing the
3236 # collection's elements.
3237 #
3238 # * Validate the elements.
3239 #
3240 # * Apply a user-defined function to map each element to some value
3241 # and extract an element-specific key value.
3242 #
3243 # * Group elements with the same key into a single element with
3244 # that key, transforming a multiply-keyed collection into a
3245 # uniquely-keyed collection.
3246 #
3247 # * Write the elements out to some data sink.
3248 #
3249 # Note that the Cloud Dataflow service may be used to run many different
3250 # types of jobs, not just Map-Reduce.
3251 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003252 "name": "A String", # The name that identifies the step. This must be unique for each
3253 # step with respect to all other steps in the Cloud Dataflow job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003254 "properties": { # Named properties associated with the step. Each kind of
3255 # predefined step has its own required set of properties.
3256 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
3257 "a_key": "", # Properties of the object.
3258 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003259 },
3260 ],
3261 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3262 # of the job it replaced.
3263 #
3264 # When sending a `CreateJobRequest`, you can update a job by specifying it
3265 # here. The job named here is stopped, and its intermediate state is
3266 # transferred to this job.
3267 "currentState": "A String", # The current state of the job.
3268 #
3269 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3270 # specified.
3271 #
3272 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3273 # terminal state. After a job has reached a terminal state, no
3274 # further state updates may be made.
3275 #
3276 # This field may be mutated by the Cloud Dataflow service;
3277 # callers cannot mutate it.
3278 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3279 # isn't contained in the submitted job.
3280 "stages": { # A mapping from each stage to the information about that stage.
3281 "a_key": { # Contains information about how a particular
3282 # google.dataflow.v1beta3.Step will be executed.
3283 "stepName": [ # The steps associated with the execution stage.
3284 # Note that stages may have several steps, and that a given step
3285 # might be run by more than one stage.
3286 "A String",
3287 ],
3288 },
3289 },
3290 },
3291}
3292
3293 x__xgafv: string, V1 error format.
3294 Allowed values
3295 1 - v1 error format
3296 2 - v2 error format
3297
3298Returns:
3299 An object of the form:
3300
3301 { # Defines a job to be run by the Cloud Dataflow service.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003302 "labels": { # User-defined labels for this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003303 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003304 # The labels map can contain no more than 64 entries. Entries of the labels
3305 # map are UTF8 strings that comply with the following restrictions:
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003306 #
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003307 # * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
3308 # * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
Dan O'Mearadd494642020-05-01 07:42:23 -07003309 # * Both keys and values are additionally constrained to be &lt;= 128 bytes in
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003310 # size.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003311 "a_key": "A String",
3312 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003313 "jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
3314 # by the metadata values provided here. Populated for ListJobs and all GetJob
3315 # views SUMMARY and higher.
3316 # ListJob response and Job SUMMARY view.
3317 "sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
3318 "versionDisplayName": "A String", # A readable string describing the version of the SDK.
3319 "version": "A String", # The version of the SDK used to run the job.
3320 "sdkSupportStatus": "A String", # The support status for this SDK version.
3321 },
3322 "pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
3323 { # Metadata for a PubSub connector used by the job.
3324 "topic": "A String", # Topic accessed in the connection.
3325 "subscription": "A String", # Subscription used in the connection.
3326 },
3327 ],
3328 "datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
3329 { # Metadata for a Datastore connector used by the job.
3330 "projectId": "A String", # ProjectId accessed in the connection.
3331 "namespace": "A String", # Namespace used in the connection.
3332 },
3333 ],
3334 "fileDetails": [ # Identification of a File source used in the Dataflow job.
3335 { # Metadata for a File connector used by the job.
3336 "filePattern": "A String", # File Pattern used to access files by the connector.
3337 },
3338 ],
3339 "spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
3340 { # Metadata for a Spanner connector used by the job.
3341 "instanceId": "A String", # InstanceId accessed in the connection.
3342 "projectId": "A String", # ProjectId accessed in the connection.
3343 "databaseId": "A String", # DatabaseId accessed in the connection.
3344 },
3345 ],
3346 "bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
3347 { # Metadata for a BigTable connector used by the job.
3348 "instanceId": "A String", # InstanceId accessed in the connection.
3349 "projectId": "A String", # ProjectId accessed in the connection.
3350 "tableId": "A String", # TableId accessed in the connection.
3351 },
3352 ],
3353 "bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
3354 { # Metadata for a BigQuery connector used by the job.
3355 "projectId": "A String", # Project accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003356 "query": "A String", # Query used to access data in the connection.
Dan O'Mearadd494642020-05-01 07:42:23 -07003357 "table": "A String", # Table accessed in the connection.
3358 "dataset": "A String", # Dataset accessed in the connection.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003359 },
3360 ],
3361 },
3362 "pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
3363 # A description of the user pipeline and stages through which it is executed.
3364 # Created by Cloud Dataflow service. Only retrieved with
3365 # JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
3366 # form. This data is provided by the Dataflow service for ease of visualizing
3367 # the pipeline and interpreting Dataflow provided metrics.
3368 "originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
3369 { # Description of the type, names/ids, and input/outputs for a transform.
3370 "kind": "A String", # Type of transform.
3371 "name": "A String", # User provided name for this transform instance.
3372 "inputCollectionName": [ # User names for all collection inputs to this transform.
3373 "A String",
3374 ],
3375 "displayData": [ # Transform-specific display data.
3376 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07003377 "key": "A String", # The key identifying the display data.
3378 # This is intended to be used as a label for the display data
3379 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003380 "shortStrValue": "A String", # A possible additional shorter value to display.
3381 # For example a java_class_name_value of com.mypackage.MyDoFn
3382 # will be stored with MyDoFn as the short_str_value and
3383 # com.mypackage.MyDoFn as the java_class_name value.
3384 # short_str_value can be displayed and java_class_name_value
3385 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07003386 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003387 "url": "A String", # An optional full URL.
3388 "floatValue": 3.14, # Contains value if the data is of float type.
3389 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3390 # language namespace (i.e. python module) which defines the display data.
3391 # This allows a dax monitoring system to specially handle the data
3392 # and perform custom rendering.
3393 "javaClassValue": "A String", # Contains value if the data is of java class type.
3394 "label": "A String", # An optional label to display in a dax UI for the element.
3395 "boolValue": True or False, # Contains value if the data is of a boolean type.
3396 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07003397 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003398 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003399 },
3400 ],
3401 "outputCollectionName": [ # User names for all collection outputs to this transform.
3402 "A String",
3403 ],
3404 "id": "A String", # SDK generated id of this transform instance.
3405 },
3406 ],
3407 "executionPipelineStage": [ # Description of each stage of execution of the pipeline.
3408 { # Description of the composing transforms, names/ids, and input/outputs of a
3409 # stage of execution. Some composing transforms and sources may have been
3410 # generated by the Dataflow service during execution planning.
3411 "componentSource": [ # Collections produced and consumed by component transforms of this stage.
3412 { # Description of an interstitial value between transforms in an execution
3413 # stage.
3414 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3415 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3416 # source is most closely associated.
3417 "name": "A String", # Dataflow service generated name for this source.
3418 },
3419 ],
3420 "kind": "A String", # Type of tranform this stage is executing.
3421 "name": "A String", # Dataflow service generated name for this stage.
3422 "outputSource": [ # Output sources for this stage.
3423 { # Description of an input or output of an execution stage.
3424 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3425 "sizeBytes": "A String", # Size of the source, if measurable.
3426 "name": "A String", # Dataflow service generated name for this source.
3427 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3428 # source is most closely associated.
3429 },
3430 ],
3431 "inputSource": [ # Input sources for this stage.
3432 { # Description of an input or output of an execution stage.
3433 "userName": "A String", # Human-readable name for this source; may be user or system generated.
3434 "sizeBytes": "A String", # Size of the source, if measurable.
3435 "name": "A String", # Dataflow service generated name for this source.
3436 "originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
3437 # source is most closely associated.
3438 },
3439 ],
3440 "componentTransform": [ # Transforms that comprise this execution stage.
3441 { # Description of a transform executed as part of an execution stage.
3442 "userName": "A String", # Human-readable name for this transform; may be user or system generated.
3443 "originalTransform": "A String", # User name for the original user transform with which this transform is
3444 # most closely associated.
3445 "name": "A String", # Dataflow service generated name for this source.
3446 },
3447 ],
3448 "id": "A String", # Dataflow service generated id for this stage.
3449 },
3450 ],
3451 "displayData": [ # Pipeline level display data.
3452 { # Data provided with a pipeline or transform to provide descriptive info.
Dan O'Mearadd494642020-05-01 07:42:23 -07003453 "key": "A String", # The key identifying the display data.
3454 # This is intended to be used as a label for the display data
3455 # when viewed in a dax monitoring system.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003456 "shortStrValue": "A String", # A possible additional shorter value to display.
3457 # For example a java_class_name_value of com.mypackage.MyDoFn
3458 # will be stored with MyDoFn as the short_str_value and
3459 # com.mypackage.MyDoFn as the java_class_name value.
3460 # short_str_value can be displayed and java_class_name_value
3461 # will be displayed as a tooltip.
Dan O'Mearadd494642020-05-01 07:42:23 -07003462 "timestampValue": "A String", # Contains value if the data is of timestamp type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003463 "url": "A String", # An optional full URL.
3464 "floatValue": 3.14, # Contains value if the data is of float type.
3465 "namespace": "A String", # The namespace for the key. This is usually a class name or programming
3466 # language namespace (i.e. python module) which defines the display data.
3467 # This allows a dax monitoring system to specially handle the data
3468 # and perform custom rendering.
3469 "javaClassValue": "A String", # Contains value if the data is of java class type.
3470 "label": "A String", # An optional label to display in a dax UI for the element.
3471 "boolValue": True or False, # Contains value if the data is of a boolean type.
3472 "strValue": "A String", # Contains value if the data is of string type.
Dan O'Mearadd494642020-05-01 07:42:23 -07003473 "durationValue": "A String", # Contains value if the data is of duration type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003474 "int64Value": "A String", # Contains value if the data is of int64 type.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003475 },
3476 ],
3477 },
3478 "stageStates": [ # This field may be mutated by the Cloud Dataflow service;
3479 # callers cannot mutate it.
3480 { # A message describing the state of a particular execution stage.
3481 "executionStageName": "A String", # The name of the execution stage.
3482 "executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
3483 "currentStateTime": "A String", # The time at which the stage transitioned to this state.
3484 },
3485 ],
3486 "id": "A String", # The unique ID of this job.
3487 #
3488 # This field is set by the Cloud Dataflow service when the Job is
3489 # created, and is immutable for the life of the job.
3490 "replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
3491 # `JOB_STATE_UPDATED`), this field contains the ID of that job.
3492 "projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003493 "transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
3494 # corresponding name prefixes of the new job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003495 "a_key": "A String",
3496 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003497 "environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003498 "workerRegion": "A String", # The Compute Engine region
3499 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3500 # which worker processing should occur, e.g. "us-west1". Mutually exclusive
3501 # with worker_zone. If neither worker_region nor worker_zone is specified,
3502 # default to the control plane's region.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003503 "version": { # A structure describing which components and their versions of the service
3504 # are required in order to run the job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003505 "a_key": "", # Properties of the object.
3506 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003507 "flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
3508 "serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
3509 # at rest, AKA a Customer Managed Encryption Key (CMEK).
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003510 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003511 # Format:
3512 # projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003513 "internalExperiments": { # Experimental settings.
3514 "a_key": "", # Properties of the object. Contains field @type with type URL.
3515 },
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003516 "dataset": "A String", # The dataset for the current project where various workflow
3517 # related tables are stored.
3518 #
3519 # The supported resource type is:
3520 #
3521 # Google BigQuery:
3522 # bigquery.googleapis.com/{dataset}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003523 "experiments": [ # The list of experiments to enable.
3524 "A String",
3525 ],
3526 "serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003527 "sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
3528 # options are passed through the service and are used to recreate the
3529 # SDK pipeline options on the worker in a language agnostic and platform
3530 # independent way.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003531 "a_key": "", # Properties of the object.
3532 },
3533 "userAgent": { # A description of the process that generated the request.
3534 "a_key": "", # Properties of the object.
3535 },
Dan O'Mearadd494642020-05-01 07:42:23 -07003536 "workerZone": "A String", # The Compute Engine zone
3537 # (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in
3538 # which worker processing should occur, e.g. "us-west1-a". Mutually exclusive
3539 # with worker_region. If neither worker_region nor worker_zone is specified,
3540 # a zone in the control plane's region is chosen based on available capacity.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003541 "workerPools": [ # The worker pools. At least one "harness" worker pool must be
3542 # specified in order for the job to have workers.
3543 { # Describes one particular pool of Cloud Dataflow workers to be
3544 # instantiated by the Cloud Dataflow service in order to perform the
3545 # computations required by a job. Note that a workflow job may use
3546 # multiple pools, in order to match the various computational
3547 # requirements of the various stages of the job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003548 "workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
3549 # harness, residing in Google Container Registry.
3550 #
3551 # Deprecated for the Fn API path. Use sdk_harness_container_images instead.
3552 "ipConfiguration": "A String", # Configuration for VM IPs.
3553 "autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
3554 "maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
3555 "algorithm": "A String", # The algorithm to use for autoscaling.
3556 },
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003557 "diskSourceImage": "A String", # Fully qualified source image for disks.
Dan O'Mearadd494642020-05-01 07:42:23 -07003558 "network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
3559 # the service will use the network "default".
3560 "zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
3561 # will attempt to choose a reasonable default.
3562 "metadata": { # Metadata to set on the Google Compute Engine VMs.
3563 "a_key": "A String",
3564 },
3565 "machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
3566 # service will attempt to choose a reasonable default.
3567 "onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
3568 # Compute Engine API.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003569 "taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
3570 # using the standard Dataflow task runner. Users should ignore
3571 # this field.
3572 "workflowFileName": "A String", # The file to store the workflow in.
3573 "logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
3574 # will not be uploaded.
3575 #
3576 # The supported resource type is:
3577 #
3578 # Google Cloud Storage:
3579 # storage.googleapis.com/{bucket}/{object}
3580 # bucket.storage.googleapis.com/{object}
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003581 "commandlinesFileName": "A String", # The file to store preprocessing commands in.
Dan O'Mearadd494642020-05-01 07:42:23 -07003582 "alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
3583 "continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
3584 "baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
3585 "vmId": "A String", # The ID string of the VM.
3586 "taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
3587 # taskrunner; e.g. "wheel".
3588 "taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
3589 # taskrunner; e.g. "root".
3590 "oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
3591 # access the Cloud Dataflow API.
3592 "A String",
3593 ],
3594 "languageHint": "A String", # The suggested backend language.
3595 "logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
3596 # console.
3597 "streamingWorkerMainClass": "A String", # The streaming worker main class name.
3598 "logDir": "A String", # The directory on the VM to store logs.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003599 "parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
3600 "reportingEnabled": True or False, # Whether to send work progress updates to the service.
3601 "shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
3602 # "shuffle/v1beta1".
3603 "workerId": "A String", # The ID of the worker running this pipeline.
3604 "baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
3605 #
3606 # When workers access Google Cloud APIs, they logically do so via
3607 # relative URLs. If this field is specified, it supplies the base
3608 # URL to use for resolving these relative URLs. The normative
3609 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3610 # Locators".
3611 #
3612 # If not specified, the default value is "http://www.googleapis.com/"
3613 "servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
3614 # "dataflow/v1b3/projects".
3615 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3616 # storage.
3617 #
3618 # The supported resource type is:
3619 #
3620 # Google Cloud Storage:
3621 #
3622 # storage.googleapis.com/{bucket}/{object}
3623 # bucket.storage.googleapis.com/{object}
3624 },
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003625 "dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003626 "harnessCommand": "A String", # The command to launch the worker harness.
3627 "tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
3628 # temporary storage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003629 #
Sai Cheemalapatie833b792017-03-24 15:06:46 -07003630 # The supported resource type is:
3631 #
3632 # Google Cloud Storage:
3633 # storage.googleapis.com/{bucket}/{object}
3634 # bucket.storage.googleapis.com/{object}
Dan O'Mearadd494642020-05-01 07:42:23 -07003635 "baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
3636 #
3637 # When workers access Google Cloud APIs, they logically do so via
3638 # relative URLs. If this field is specified, it supplies the base
3639 # URL to use for resolving these relative URLs. The normative
3640 # algorithm used is defined by RFC 1808, "Relative Uniform Resource
3641 # Locators".
3642 #
3643 # If not specified, the default value is "http://www.googleapis.com/"
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003644 },
Dan O'Mearadd494642020-05-01 07:42:23 -07003645 "numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
3646 # service will choose a number of threads (according to the number of cores
3647 # on the selected machine type for batch, or 1 by convention for streaming).
3648 "poolArgs": { # Extra arguments for this worker pool.
3649 "a_key": "", # Properties of the object. Contains field @type with type URL.
3650 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003651 "packages": [ # Packages to be installed on workers.
3652 { # The packages that must be installed in order for a worker to run the
3653 # steps of the Cloud Dataflow job that will be assigned to its worker
3654 # pool.
3655 #
3656 # This is the mechanism by which the Cloud Dataflow SDK causes code to
3657 # be loaded onto the workers. For example, the Cloud Dataflow Java SDK
3658 # might use this to install jars containing the user's code and all of the
3659 # various dependencies (libraries, data files, etc.) required in order
3660 # for that code to run.
3661 "location": "A String", # The resource to read the package from. The supported resource type is:
3662 #
3663 # Google Cloud Storage:
3664 #
3665 # storage.googleapis.com/{bucket}
3666 # bucket.storage.googleapis.com/
3667 "name": "A String", # The name of the package.
3668 },
3669 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003670 "defaultPackageSet": "A String", # The default package set to install. This allows the service to
3671 # select a default set of packages which are useful to worker
3672 # harnesses written in a particular language.
3673 "kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
3674 # are supported.
3675 "diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003676 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003677 "teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
3678 # Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
3679 # `TEARDOWN_NEVER`.
3680 # `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
3681 # the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
3682 # if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
3683 # down.
3684 #
3685 # If the workers are not torn down by the service, they will
3686 # continue to run and use Google Compute Engine VM resources in the
3687 # user's project until they are explicitly terminated by the user.
3688 # Because of this, Google recommends using the `TEARDOWN_ALWAYS`
3689 # policy except for small, manually supervised test jobs.
3690 #
3691 # If unknown or unspecified, the service will attempt to choose a reasonable
3692 # default.
Dan O'Mearadd494642020-05-01 07:42:23 -07003693 "diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
3694 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003695 "numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
3696 # execute the job. If zero or unspecified, the service will
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003697 # attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003698 "subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
3699 # the form "regions/REGION/subnetworks/SUBNETWORK".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003700 "dataDisks": [ # Data disks that are used by a VM in this workflow.
3701 { # Describes the data disk used by a workflow job.
3702 "mountPoint": "A String", # Directory in a VM where disk is mounted.
3703 "sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
3704 # attempt to choose a reasonable default.
3705 "diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
3706 # must be a disk type appropriate to the project and zone in which
3707 # the workers will run. If unknown or unspecified, the service
3708 # will attempt to choose a reasonable default.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003709 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003710 # For example, the standard persistent disk type is a resource name
3711 # typically ending in "pd-standard". If SSD persistent disks are
3712 # available, the resource name typically ends with "pd-ssd". The
3713 # actual valid values are defined the Google Compute Engine API,
3714 # not by the Cloud Dataflow API; consult the Google Compute Engine
3715 # documentation for more information about determining the set of
3716 # available disk types for a particular project and zone.
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003717 #
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003718 # Google Compute Engine Disk types are local to a particular
3719 # project in a particular zone, and so the resource name will
3720 # typically look something like this:
3721 #
3722 # compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
Sai Cheemalapati4ba8c232017-06-06 18:46:08 -04003723 },
3724 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003725 "sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will
3726 # only be set in the Fn API path. For non-cross-language pipelines this
3727 # should have only one entry. Cross-language pipelines will have two or more
3728 # entries.
3729 { # Defines a SDK harness container for executing Dataflow pipelines.
3730 "containerImage": "A String", # A docker container image that resides in Google Container Registry.
3731 "useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK
3732 # container instance with this image. If false (or unset) recommends using
3733 # more than one core per SDK container instance with this image for
3734 # efficiency. Note that Dataflow service may choose to override this property
3735 # if needed.
3736 },
3737 ],
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003738 },
3739 ],
Dan O'Mearadd494642020-05-01 07:42:23 -07003740 "clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
3741 # unspecified, the service will attempt to choose a reasonable
3742 # default. This should be in the form of the API service name,
3743 # e.g. "compute.googleapis.com".
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003744 "tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
3745 # storage. The system will append the suffix "/temp-{JOBNAME} to
3746 # this resource prefix, where {JOBNAME} is the value of the
3747 # job_name field. The resulting bucket and object prefix is used
3748 # as the prefix of the resources used to store temporary data
3749 # needed during the job execution. NOTE: This will override the
3750 # value in taskrunner_settings.
3751 # The supported resource type is:
3752 #
3753 # Google Cloud Storage:
3754 #
3755 # storage.googleapis.com/{bucket}/{object}
3756 # bucket.storage.googleapis.com/{object}
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003757 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003758 "location": "A String", # The [regional endpoint]
3759 # (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
3760 # contains this job.
3761 "tempFiles": [ # A set of files the system should be aware of that are used
3762 # for temporary storage. These temporary files will be
3763 # removed on job completion.
3764 # No duplicates are allowed.
3765 # No file patterns are supported.
3766 #
3767 # The supported files are:
3768 #
3769 # Google Cloud Storage:
3770 #
3771 # storage.googleapis.com/{bucket}/{object}
3772 # bucket.storage.googleapis.com/{object}
3773 "A String",
3774 ],
3775 "type": "A String", # The type of Cloud Dataflow job.
3776 "clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
3777 # If this field is set, the service will ensure its uniqueness.
3778 # The request to create a job will fail if the service has knowledge of a
3779 # previously submitted job with the same client's ID and job name.
3780 # The caller may use this field to ensure idempotence of job
3781 # creation across retried attempts to create a job.
3782 # By default, the field is empty and, in that case, the service ignores it.
3783 "createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
3784 # snapshot.
3785 "stepsLocation": "A String", # The GCS location where the steps are stored.
3786 "currentStateTime": "A String", # The timestamp associated with the current state.
3787 "startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
3788 # Flexible resource scheduling jobs are started with some delay after job
3789 # creation, so start_time is unset before start and is updated when the
3790 # job is started by the Cloud Dataflow service. For other jobs, start_time
3791 # always equals to create_time and is immutable and set by the Cloud Dataflow
3792 # service.
3793 "createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
3794 # Cloud Dataflow service.
3795 "requestedState": "A String", # The job's requested state.
3796 #
3797 # `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
3798 # `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
3799 # also be used to directly set a job's requested state to
3800 # `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
3801 # job if it has not already reached a terminal state.
3802 "name": "A String", # The user-specified Cloud Dataflow job name.
3803 #
3804 # Only one Job with a given name may exist in a project at any
3805 # given time. If a caller attempts to create a Job with the same
3806 # name as an already-existing Job, the attempt returns the
3807 # existing Job.
3808 #
3809 # The name must match the regular expression
3810 # `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
3811 "steps": [ # Exactly one of step or steps_location should be specified.
3812 #
3813 # The top-level steps that constitute the entire job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003814 { # Defines a particular step within a Cloud Dataflow job.
3815 #
3816 # A job consists of multiple steps, each of which performs some
3817 # specific operation as part of the overall job. Data is typically
3818 # passed from one step to another as part of the job.
3819 #
3820 # Here's an example of a sequence of steps which together implement a
3821 # Map-Reduce job:
3822 #
3823 # * Read a collection of data from some source, parsing the
3824 # collection's elements.
3825 #
3826 # * Validate the elements.
3827 #
3828 # * Apply a user-defined function to map each element to some value
3829 # and extract an element-specific key value.
3830 #
3831 # * Group elements with the same key into a single element with
3832 # that key, transforming a multiply-keyed collection into a
3833 # uniquely-keyed collection.
3834 #
3835 # * Write the elements out to some data sink.
3836 #
3837 # Note that the Cloud Dataflow service may be used to run many different
3838 # types of jobs, not just Map-Reduce.
3839 "kind": "A String", # The kind of step in the Cloud Dataflow job.
Dan O'Mearadd494642020-05-01 07:42:23 -07003840 "name": "A String", # The name that identifies the step. This must be unique for each
3841 # step with respect to all other steps in the Cloud Dataflow job.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003842 "properties": { # Named properties associated with the step. Each kind of
3843 # predefined step has its own required set of properties.
3844 # Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003845 "a_key": "", # Properties of the object.
3846 },
3847 },
3848 ],
Thomas Coffee2f245372017-03-27 10:39:26 -07003849 "replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
3850 # of the job it replaced.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003851 #
Thomas Coffee2f245372017-03-27 10:39:26 -07003852 # When sending a `CreateJobRequest`, you can update a job by specifying it
3853 # here. The job named here is stopped, and its intermediate state is
3854 # transferred to this job.
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003855 "currentState": "A String", # The current state of the job.
3856 #
3857 # Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
3858 # specified.
3859 #
3860 # A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
3861 # terminal state. After a job has reached a terminal state, no
3862 # further state updates may be made.
3863 #
3864 # This field may be mutated by the Cloud Dataflow service;
3865 # callers cannot mutate it.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003866 "executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
3867 # isn't contained in the submitted job.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003868 "stages": { # A mapping from each stage to the information about that stage.
Sai Cheemalapatic30d2b52017-03-13 12:12:03 -04003869 "a_key": { # Contains information about how a particular
3870 # google.dataflow.v1beta3.Step will be executed.
3871 "stepName": [ # The steps associated with the execution stage.
3872 # Note that stages may have several steps, and that a given step
3873 # might be run by more than one stage.
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003874 "A String",
3875 ],
3876 },
3877 },
3878 },
Bu Sun Kim715bd7f2019-06-14 16:50:42 -07003879 }</pre>
Jon Wayne Parrott692617a2017-01-06 09:58:29 -08003880</div>
3881
3882</body></html>