Blame - docs/dyn/dataflow_v1b3.projects.jobs.html - platform/external/python/google-api-python-client

2015-06-15 16:44:50 +0000

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

88

<code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>

89

</p>

90

<p class="firstline">Returns the workItems Resource.</p>

91

92

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

93

<code><a href="#aggregated">aggregated(projectId, pageSize=None, filter=None, location=None, pageToken=None, view=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

94

<p class="firstline">List the jobs of a project across all regions.</p>

95

96

<code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>

97

<p class="firstline">Retrieves the next page of results.</p>

98

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

99

<code><a href="#create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

100

<p class="firstline">Creates a Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

101

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

102

<code><a href="#get">get(projectId, jobId, location=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

103

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

104

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

105

<code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

106

<p class="firstline">Request the job status.</p>

107

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

108

<code><a href="#list">list(projectId, filter=None, pageSize=None, location=None, view=None, pageToken=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

109

<p class="firstline">List the jobs of a project.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

110

111

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

112

<p class="firstline">Retrieves the next page of results.</p>

113

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

114

<code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

115

<p class="firstline">Snapshot the state of a streaming job.</p>

116

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

117

<code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

118

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

119

<h3>Method Details</h3>

120

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

121

<code class="details" id="aggregated">aggregated(projectId, pageSize=None, filter=None, location=None, pageToken=None, view=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

122

<pre>List the jobs of a project across all regions.

123

124

Args:

125

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

126

pageSize: integer, If there are many jobs, limit response to at most this many.

127

The actual number of jobs returned will be the lesser of max_responses

128

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

129

filter: string, The kind of filter to use.

130

location: string, The [regional endpoint]

131

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

132

contains this job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

133

pageToken: string, Set this to the 'next_page_token' field of a previous response

134

to request additional results in a long list.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

135

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

136

x__xgafv: string, V1 error format.

137

Allowed values

138

1 - v1 error format

139

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

140

141

Returns:

142

An object of the form:

143

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

144

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

145

# be a partial response, depending on the page size in the ListJobsRequest.

146

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

147

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

148

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

149

"jobs": [ # A subset of the requested job information.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

150

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

151

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

152

# A description of the user pipeline and stages through which it is executed.

153

# Created by Cloud Dataflow service. Only retrieved with

154

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

155

# form. This data is provided by the Dataflow service for ease of visualizing

156

# the pipeline and interpreting Dataflow provided metrics.

157

"displayData": [ # Pipeline level display data.

158

{ # Data provided with a pipeline or transform to provide descriptive info.

159

"url": "A String", # An optional full URL.

160

"javaClassValue": "A String", # Contains value if the data is of java class type.

161

"timestampValue": "A String", # Contains value if the data is of timestamp type.

162

"durationValue": "A String", # Contains value if the data is of duration type.

163

"label": "A String", # An optional label to display in a dax UI for the element.

164

"key": "A String", # The key identifying the display data.

165

# This is intended to be used as a label for the display data

166

# when viewed in a dax monitoring system.

167

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

168

# language namespace (i.e. python module) which defines the display data.

169

# This allows a dax monitoring system to specially handle the data

170

# and perform custom rendering.

171

"floatValue": 3.14, # Contains value if the data is of float type.

172

"strValue": "A String", # Contains value if the data is of string type.

173

"int64Value": "A String", # Contains value if the data is of int64 type.

174

"boolValue": True or False, # Contains value if the data is of a boolean type.

175

"shortStrValue": "A String", # A possible additional shorter value to display.

176

# For example a java_class_name_value of com.mypackage.MyDoFn

177

# will be stored with MyDoFn as the short_str_value and

178

# com.mypackage.MyDoFn as the java_class_name value.

179

# short_str_value can be displayed and java_class_name_value

180

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

181

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

182

],

183

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

184

{ # Description of the type, names/ids, and input/outputs for a transform.

185

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

186

"A String",

187

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

188

"displayData": [ # Transform-specific display data.

189

{ # Data provided with a pipeline or transform to provide descriptive info.

190

"url": "A String", # An optional full URL.

191

"javaClassValue": "A String", # Contains value if the data is of java class type.

192

"timestampValue": "A String", # Contains value if the data is of timestamp type.

193

"durationValue": "A String", # Contains value if the data is of duration type.

194

"label": "A String", # An optional label to display in a dax UI for the element.

195

"key": "A String", # The key identifying the display data.

196

# This is intended to be used as a label for the display data

197

# when viewed in a dax monitoring system.

198

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

199

# language namespace (i.e. python module) which defines the display data.

200

# This allows a dax monitoring system to specially handle the data

201

# and perform custom rendering.

202

"floatValue": 3.14, # Contains value if the data is of float type.

203

"strValue": "A String", # Contains value if the data is of string type.

204

"int64Value": "A String", # Contains value if the data is of int64 type.

205

"boolValue": True or False, # Contains value if the data is of a boolean type.

206

"shortStrValue": "A String", # A possible additional shorter value to display.

207

# For example a java_class_name_value of com.mypackage.MyDoFn

208

# will be stored with MyDoFn as the short_str_value and

209

# com.mypackage.MyDoFn as the java_class_name value.

210

# short_str_value can be displayed and java_class_name_value

211

# will be displayed as a tooltip.

212

},

213

],

214

"id": "A String", # SDK generated id of this transform instance.

215

"inputCollectionName": [ # User names for all collection inputs to this transform.

216

"A String",

217

],

218

"name": "A String", # User provided name for this transform instance.

219

"kind": "A String", # Type of transform.

220

},

221

],

222

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

223

{ # Description of the composing transforms, names/ids, and input/outputs of a

224

# stage of execution. Some composing transforms and sources may have been

225

# generated by the Dataflow service during execution planning.

226

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

227

{ # Description of an interstitial value between transforms in an execution

228

# stage.

229

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

230

"name": "A String", # Dataflow service generated name for this source.

231

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

232

# source is most closely associated.

233

},

234

],

235

"inputSource": [ # Input sources for this stage.

236

{ # Description of an input or output of an execution stage.

237

"userName": "A String", # Human-readable name for this source; may be user or system generated.

238

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

239

# source is most closely associated.

240

"sizeBytes": "A String", # Size of the source, if measurable.

241

"name": "A String", # Dataflow service generated name for this source.

242

},

243

],

244

"name": "A String", # Dataflow service generated name for this stage.

245

"componentTransform": [ # Transforms that comprise this execution stage.

246

{ # Description of a transform executed as part of an execution stage.

247

"name": "A String", # Dataflow service generated name for this source.

248

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

249

"originalTransform": "A String", # User name for the original user transform with which this transform is

250

# most closely associated.

251

},

252

],

253

"id": "A String", # Dataflow service generated id for this stage.

254

"outputSource": [ # Output sources for this stage.

255

{ # Description of an input or output of an execution stage.

256

"userName": "A String", # Human-readable name for this source; may be user or system generated.

257

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

258

# source is most closely associated.

259

"sizeBytes": "A String", # Size of the source, if measurable.

260

"name": "A String", # Dataflow service generated name for this source.

261

},

262

],

263

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

268

#

269

# The labels map can contain no more than 64 entries. Entries of the labels

270

# map are UTF8 strings that comply with the following restrictions:

271

#

272

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

273

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

274

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

279

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

280

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

281

"workerRegion": "A String", # The Compute Engine region

282

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

283

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

284

# with worker_zone. If neither worker_region nor worker_zone is specified,

285

# default to the control plane's region.

286

"userAgent": { # A description of the process that generated the request.

287

"a_key": "", # Properties of the object.

288

},

289

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

290

"version": { # A structure describing which components and their versions of the service

291

# are required in order to run the job.

292

"a_key": "", # Properties of the object.

293

},

294

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

295

# at rest, AKA a Customer Managed Encryption Key (CMEK).

296

#

297

# Format:

298

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

299

"experiments": [ # The list of experiments to enable.

300

"A String",

301

],

302

"workerZone": "A String", # The Compute Engine zone

303

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

304

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

305

# with worker_region. If neither worker_region nor worker_zone is specified,

306

# a zone in the control plane's region is chosen based on available capacity.

307

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

308

# specified in order for the job to have workers.

309

{ # Describes one particular pool of Cloud Dataflow workers to be

310

# instantiated by the Cloud Dataflow service in order to perform the

311

# computations required by a job. Note that a workflow job may use

312

# multiple pools, in order to match the various computational

313

# requirements of the various stages of the job.

314

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

315

# Compute Engine API.

316

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

317

# only be set in the Fn API path. For non-cross-language pipelines this

318

# should have only one entry. Cross-language pipelines will have two or more

319

# entries.

320

{ # Defines a SDK harness container for executing Dataflow pipelines.

321

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

322

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

323

# container instance with this image. If false (or unset) recommends using

324

# more than one core per SDK container instance with this image for

325

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

330

# will attempt to choose a reasonable default.

331

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

332

# are supported.

333

"metadata": { # Metadata to set on the Google Compute Engine VMs.

334

"a_key": "A String",

335

},

336

"diskSourceImage": "A String", # Fully qualified source image for disks.

337

"dataDisks": [ # Data disks that are used by a VM in this workflow.

338

{ # Describes the data disk used by a workflow job.

339

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

340

# attempt to choose a reasonable default.

341

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

342

# must be a disk type appropriate to the project and zone in which

343

# the workers will run. If unknown or unspecified, the service

344

# will attempt to choose a reasonable default.

345

#

346

# For example, the standard persistent disk type is a resource name

347

# typically ending in "pd-standard". If SSD persistent disks are

348

# available, the resource name typically ends with "pd-ssd". The

349

# actual valid values are defined the Google Compute Engine API,

350

# not by the Cloud Dataflow API; consult the Google Compute Engine

351

# documentation for more information about determining the set of

352

# available disk types for a particular project and zone.

353

#

354

# Google Compute Engine Disk types are local to a particular

355

# project in a particular zone, and so the resource name will

356

# typically look something like this:

357

#

358

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

359

"mountPoint": "A String", # Directory in a VM where disk is mounted.

360

},

361

],

362

"packages": [ # Packages to be installed on workers.

363

{ # The packages that must be installed in order for a worker to run the

364

# steps of the Cloud Dataflow job that will be assigned to its worker

365

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

366

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

367

# This is the mechanism by which the Cloud Dataflow SDK causes code to

368

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

369

# might use this to install jars containing the user's code and all of the

370

# various dependencies (libraries, data files, etc.) required in order

371

# for that code to run.

372

"name": "A String", # The name of the package.

373

"location": "A String", # The resource to read the package from. The supported resource type is:

374

#

375

# Google Cloud Storage:

376

#

377

# storage.googleapis.com/{bucket}

378

# bucket.storage.googleapis.com/

379

},

380

],

381

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

382

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

383

# `TEARDOWN_NEVER`.

384

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

385

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

386

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

387

# down.

388

#

389

# If the workers are not torn down by the service, they will

390

# continue to run and use Google Compute Engine VM resources in the

391

# user's project until they are explicitly terminated by the user.

392

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

393

# policy except for small, manually supervised test jobs.

394

#

395

# If unknown or unspecified, the service will attempt to choose a reasonable

396

# default.

397

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

398

# the service will use the network "default".

399

"ipConfiguration": "A String", # Configuration for VM IPs.

400

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

401

# attempt to choose a reasonable default.

402

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

403

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

404

"algorithm": "A String", # The algorithm to use for autoscaling.

405

},

406

"poolArgs": { # Extra arguments for this worker pool.

407

"a_key": "", # Properties of the object. Contains field @type with type URL.

408

},

409

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

410

# the form "regions/REGION/subnetworks/SUBNETWORK".

411

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

412

# execute the job. If zero or unspecified, the service will

413

# attempt to choose a reasonable default.

414

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

415

# service will choose a number of threads (according to the number of cores

416

# on the selected machine type for batch, or 1 by convention for streaming).

417

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

418

# harness, residing in Google Container Registry.

419

#

420

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

421

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

422

# using the standard Dataflow task runner. Users should ignore

423

# this field.

424

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

425

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

426

# access the Cloud Dataflow API.

427

"A String",

428

],

429

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

430

#

431

# When workers access Google Cloud APIs, they logically do so via

432

# relative URLs. If this field is specified, it supplies the base

433

# URL to use for resolving these relative URLs. The normative

434

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

435

# Locators".

436

#

437

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

438

"workflowFileName": "A String", # The file to store the workflow in.

439

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

440

# console.

441

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

442

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

443

# taskrunner; e.g. "root".

444

"vmId": "A String", # The ID string of the VM.

445

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

446

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

447

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

448

# "shuffle/v1beta1".

449

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

450

# storage.

451

#

452

# The supported resource type is:

453

#

454

# Google Cloud Storage:

455

#

456

# storage.googleapis.com/{bucket}/{object}

457

# bucket.storage.googleapis.com/{object}

458

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

459

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

460

# "dataflow/v1b3/projects".

461

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

462

#

463

# When workers access Google Cloud APIs, they logically do so via

464

# relative URLs. If this field is specified, it supplies the base

465

# URL to use for resolving these relative URLs. The normative

466

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

467

# Locators".

468

#

469

# If not specified, the default value is "http://www.googleapis.com/"

470

"workerId": "A String", # The ID of the worker running this pipeline.

471

},

472

"harnessCommand": "A String", # The command to launch the worker harness.

473

"logDir": "A String", # The directory on the VM to store logs.

474

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

475

"languageHint": "A String", # The suggested backend language.

476

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

477

# taskrunner; e.g. "wheel".

478

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

479

# will not be uploaded.

480

#

481

# The supported resource type is:

482

#

483

# Google Cloud Storage:

484

# storage.googleapis.com/{bucket}/{object}

485

# bucket.storage.googleapis.com/{object}

486

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

487

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

488

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

489

# temporary storage.

490

#

491

# The supported resource type is:

492

#

493

# Google Cloud Storage:

494

# storage.googleapis.com/{bucket}/{object}

495

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

496

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

497

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

498

# attempt to choose a reasonable default.

499

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

500

# select a default set of packages which are useful to worker

501

# harnesses written in a particular language.

502

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

503

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

504

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

505

],

506

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

507

# storage. The system will append the suffix "/temp-{JOBNAME} to

508

# this resource prefix, where {JOBNAME} is the value of the

509

# job_name field. The resulting bucket and object prefix is used

510

# as the prefix of the resources used to store temporary data

511

# needed during the job execution. NOTE: This will override the

512

# value in taskrunner_settings.

513

# The supported resource type is:

514

#

515

# Google Cloud Storage:

516

#

517

# storage.googleapis.com/{bucket}/{object}

518

# bucket.storage.googleapis.com/{object}

519

"internalExperiments": { # Experimental settings.

520

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

521

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

522

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

523

# options are passed through the service and are used to recreate the

524

# SDK pipeline options on the worker in a language agnostic and platform

525

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

526

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

527

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

528

"dataset": "A String", # The dataset for the current project where various workflow

529

# related tables are stored.

530

#

531

# The supported resource type is:

532

#

533

# Google BigQuery:

534

# bigquery.googleapis.com/{dataset}

535

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

536

# unspecified, the service will attempt to choose a reasonable

537

# default. This should be in the form of the API service name,

538

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

539

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

540

"stepsLocation": "A String", # The GCS location where the steps are stored.

541

"steps": [ # Exactly one of step or steps_location should be specified.

542

#

543

# The top-level steps that constitute the entire job.

544

{ # Defines a particular step within a Cloud Dataflow job.

545

#

546

# A job consists of multiple steps, each of which performs some

547

# specific operation as part of the overall job. Data is typically

548

# passed from one step to another as part of the job.

549

#

550

# Here's an example of a sequence of steps which together implement a

551

# Map-Reduce job:

552

#

553

# * Read a collection of data from some source, parsing the

554

# collection's elements.

555

#

556

# * Validate the elements.

557

#

558

# * Apply a user-defined function to map each element to some value

559

# and extract an element-specific key value.

560

#

561

# * Group elements with the same key into a single element with

562

# that key, transforming a multiply-keyed collection into a

563

# uniquely-keyed collection.

564

#

565

# * Write the elements out to some data sink.

566

#

567

# Note that the Cloud Dataflow service may be used to run many different

568

# types of jobs, not just Map-Reduce.

569

"kind": "A String", # The kind of step in the Cloud Dataflow job.

570

"properties": { # Named properties associated with the step. Each kind of

571

# predefined step has its own required set of properties.

572

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

573

"a_key": "", # Properties of the object.

574

},

575

"name": "A String", # The name that identifies the step. This must be unique for each

576

# step with respect to all other steps in the Cloud Dataflow job.

577

},

578

],

579

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

580

# callers cannot mutate it.

581

{ # A message describing the state of a particular execution stage.

582

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

583

"executionStageName": "A String", # The name of the execution stage.

584

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

585

},

586

],

587

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

588

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

589

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

590

# by the metadata values provided here. Populated for ListJobs and all GetJob

591

# views SUMMARY and higher.

592

# ListJob response and Job SUMMARY view.

593

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

594

"sdkSupportStatus": "A String", # The support status for this SDK version.

595

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

596

"version": "A String", # The version of the SDK used to run the job.

597

},

598

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

599

{ # Metadata for a BigTable connector used by the job.

600

"instanceId": "A String", # InstanceId accessed in the connection.

601

"tableId": "A String", # TableId accessed in the connection.

602

"projectId": "A String", # ProjectId accessed in the connection.

603

},

604

],

605

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

606

{ # Metadata for a PubSub connector used by the job.

607

"subscription": "A String", # Subscription used in the connection.

608

"topic": "A String", # Topic accessed in the connection.

609

},

610

],

611

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

612

{ # Metadata for a BigQuery connector used by the job.

613

"dataset": "A String", # Dataset accessed in the connection.

614

"projectId": "A String", # Project accessed in the connection.

615

"query": "A String", # Query used to access data in the connection.

616

"table": "A String", # Table accessed in the connection.

617

},

618

],

619

"fileDetails": [ # Identification of a File source used in the Dataflow job.

620

{ # Metadata for a File connector used by the job.

621

"filePattern": "A String", # File Pattern used to access files by the connector.

622

},

623

],

624

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

625

{ # Metadata for a Datastore connector used by the job.

626

"namespace": "A String", # Namespace used in the connection.

627

"projectId": "A String", # ProjectId accessed in the connection.

628

},

629

],

630

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

631

{ # Metadata for a Spanner connector used by the job.

632

"instanceId": "A String", # InstanceId accessed in the connection.

633

"databaseId": "A String", # DatabaseId accessed in the connection.

634

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

639

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

640

# contains this job.

641

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

642

# corresponding name prefixes of the new job.

643

"a_key": "A String",

644

},

645

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

646

# Flexible resource scheduling jobs are started with some delay after job

647

# creation, so start_time is unset before start and is updated when the

648

# job is started by the Cloud Dataflow service. For other jobs, start_time

649

# always equals to create_time and is immutable and set by the Cloud Dataflow

650

# service.

651

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

652

# If this field is set, the service will ensure its uniqueness.

653

# The request to create a job will fail if the service has knowledge of a

654

# previously submitted job with the same client's ID and job name.

655

# The caller may use this field to ensure idempotence of job

656

# creation across retried attempts to create a job.

657

# By default, the field is empty and, in that case, the service ignores it.

658

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

659

# isn't contained in the submitted job.

660

"stages": { # A mapping from each stage to the information about that stage.

661

"a_key": { # Contains information about how a particular

662

# google.dataflow.v1beta3.Step will be executed.

663

"stepName": [ # The steps associated with the execution stage.

664

# Note that stages may have several steps, and that a given step

665

# might be run by more than one stage.

666

"A String",

667

],

668

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

669

},

670

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

671

"type": "A String", # The type of Cloud Dataflow job.

672

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

673

# Cloud Dataflow service.

674

"tempFiles": [ # A set of files the system should be aware of that are used

675

# for temporary storage. These temporary files will be

676

# removed on job completion.

677

# No duplicates are allowed.

678

# No file patterns are supported.

679

#

680

# The supported files are:

681

#

682

# Google Cloud Storage:

683

#

684

# storage.googleapis.com/{bucket}/{object}

685

# bucket.storage.googleapis.com/{object}

686

"A String",

687

],

688

"id": "A String", # The unique ID of this job.

689

#

690

# This field is set by the Cloud Dataflow service when the Job is

691

# created, and is immutable for the life of the job.

692

"requestedState": "A String", # The job's requested state.

693

#

694

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

695

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

696

# also be used to directly set a job's requested state to

697

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

698

# job if it has not already reached a terminal state.

699

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

700

# of the job it replaced.

701

#

702

# When sending a `CreateJobRequest`, you can update a job by specifying it

703

# here. The job named here is stopped, and its intermediate state is

704

# transferred to this job.

705

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

706

# snapshot.

707

"currentState": "A String", # The current state of the job.

708

#

709

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

710

# specified.

711

#

712

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

713

# terminal state. After a job has reached a terminal state, no

714

# further state updates may be made.

715

#

716

# This field may be mutated by the Cloud Dataflow service;

717

# callers cannot mutate it.

718

"name": "A String", # The user-specified Cloud Dataflow job name.

719

#

720

# Only one Job with a given name may exist in a project at any

721

# given time. If a caller attempts to create a Job with the same

722

# name as an already-existing Job, the attempt returns the

723

# existing Job.

724

#

725

# The name must match the regular expression

726

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

727

"currentStateTime": "A String", # The timestamp associated with the current state.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

728

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

729

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

730

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

731

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

732

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

733

# failed to respond.

734

{ # Indicates which [regional endpoint]

735

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

736

# to respond to a request for data.

737

"name": "A String", # The name of the [regional endpoint]

738

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

739

# failed to respond.

740

},

741

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

<code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>

747

<pre>Retrieves the next page of results.

748

749

Args:

750

previous_request: The request for the previous page. (required)

751

previous_response: The response from the request for the previous page. (required)

752

753

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

754

A request object that you can call 'execute()' on to request the next

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

755

page. Returns None if there are no more items in the collection.

</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

760

<code class="details" id="create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

761

<pre>Creates a Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

762

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

763

To create a job, we recommend using `projects.locations.jobs.create` with a

764

[regional endpoint]

765

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

766

`projects.jobs.create` is not recommended, as your job will always start

767

in `us-central1`.

768

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

769

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

770

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

771

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

772

The object takes the form of:

773

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

774

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

775

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

776

# A description of the user pipeline and stages through which it is executed.

777

# Created by Cloud Dataflow service. Only retrieved with

778

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

779

# form. This data is provided by the Dataflow service for ease of visualizing

780

# the pipeline and interpreting Dataflow provided metrics.

781

"displayData": [ # Pipeline level display data.

782

{ # Data provided with a pipeline or transform to provide descriptive info.

783

"url": "A String", # An optional full URL.

784

"javaClassValue": "A String", # Contains value if the data is of java class type.

785

"timestampValue": "A String", # Contains value if the data is of timestamp type.

786

"durationValue": "A String", # Contains value if the data is of duration type.

787

"label": "A String", # An optional label to display in a dax UI for the element.

788

"key": "A String", # The key identifying the display data.

789

# This is intended to be used as a label for the display data

790

# when viewed in a dax monitoring system.

791

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

792

# language namespace (i.e. python module) which defines the display data.

793

# This allows a dax monitoring system to specially handle the data

794

# and perform custom rendering.

795

"floatValue": 3.14, # Contains value if the data is of float type.

796

"strValue": "A String", # Contains value if the data is of string type.

797

"int64Value": "A String", # Contains value if the data is of int64 type.

798

"boolValue": True or False, # Contains value if the data is of a boolean type.

799

"shortStrValue": "A String", # A possible additional shorter value to display.

800

# For example a java_class_name_value of com.mypackage.MyDoFn

801

# will be stored with MyDoFn as the short_str_value and

802

# com.mypackage.MyDoFn as the java_class_name value.

803

# short_str_value can be displayed and java_class_name_value

804

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

805

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

806

],

807

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

808

{ # Description of the type, names/ids, and input/outputs for a transform.

809

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

810

"A String",

811

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

812

"displayData": [ # Transform-specific display data.

813

{ # Data provided with a pipeline or transform to provide descriptive info.

814

"url": "A String", # An optional full URL.

815

"javaClassValue": "A String", # Contains value if the data is of java class type.

816

"timestampValue": "A String", # Contains value if the data is of timestamp type.

817

"durationValue": "A String", # Contains value if the data is of duration type.

818

"label": "A String", # An optional label to display in a dax UI for the element.

819

"key": "A String", # The key identifying the display data.

820

# This is intended to be used as a label for the display data

821

# when viewed in a dax monitoring system.

822

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

823

# language namespace (i.e. python module) which defines the display data.

824

# This allows a dax monitoring system to specially handle the data

825

# and perform custom rendering.

826

"floatValue": 3.14, # Contains value if the data is of float type.

827

"strValue": "A String", # Contains value if the data is of string type.

828

"int64Value": "A String", # Contains value if the data is of int64 type.

829

"boolValue": True or False, # Contains value if the data is of a boolean type.

830

"shortStrValue": "A String", # A possible additional shorter value to display.

831

# For example a java_class_name_value of com.mypackage.MyDoFn

832

# will be stored with MyDoFn as the short_str_value and

833

# com.mypackage.MyDoFn as the java_class_name value.

834

# short_str_value can be displayed and java_class_name_value

835

# will be displayed as a tooltip.

836

},

837

],

838

"id": "A String", # SDK generated id of this transform instance.

839

"inputCollectionName": [ # User names for all collection inputs to this transform.

840

"A String",

841

],

842

"name": "A String", # User provided name for this transform instance.

843

"kind": "A String", # Type of transform.

844

},

845

],

846

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

847

{ # Description of the composing transforms, names/ids, and input/outputs of a

848

# stage of execution. Some composing transforms and sources may have been

849

# generated by the Dataflow service during execution planning.

850

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

851

{ # Description of an interstitial value between transforms in an execution

852

# stage.

853

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

854

"name": "A String", # Dataflow service generated name for this source.

855

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

856

# source is most closely associated.

857

},

858

],

859

"inputSource": [ # Input sources for this stage.

860

{ # Description of an input or output of an execution stage.

861

"userName": "A String", # Human-readable name for this source; may be user or system generated.

862

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

863

# source is most closely associated.

864

"sizeBytes": "A String", # Size of the source, if measurable.

865

"name": "A String", # Dataflow service generated name for this source.

866

},

867

],

868

"name": "A String", # Dataflow service generated name for this stage.

869

"componentTransform": [ # Transforms that comprise this execution stage.

870

{ # Description of a transform executed as part of an execution stage.

871

"name": "A String", # Dataflow service generated name for this source.

872

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

873

"originalTransform": "A String", # User name for the original user transform with which this transform is

874

# most closely associated.

875

},

876

],

877

"id": "A String", # Dataflow service generated id for this stage.

878

"outputSource": [ # Output sources for this stage.

879

{ # Description of an input or output of an execution stage.

880

"userName": "A String", # Human-readable name for this source; may be user or system generated.

881

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

882

# source is most closely associated.

883

"sizeBytes": "A String", # Size of the source, if measurable.

884

"name": "A String", # Dataflow service generated name for this source.

885

},

886

],

887

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

892

#

893

# The labels map can contain no more than 64 entries. Entries of the labels

894

# map are UTF8 strings that comply with the following restrictions:

895

#

896

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

897

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

898

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

903

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

904

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

905

"workerRegion": "A String", # The Compute Engine region

906

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

907

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

908

# with worker_zone. If neither worker_region nor worker_zone is specified,

909

# default to the control plane's region.

910

"userAgent": { # A description of the process that generated the request.

911

"a_key": "", # Properties of the object.

912

},

913

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

914

"version": { # A structure describing which components and their versions of the service

915

# are required in order to run the job.

916

"a_key": "", # Properties of the object.

917

},

918

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

919

# at rest, AKA a Customer Managed Encryption Key (CMEK).

920

#

921

# Format:

922

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

923

"experiments": [ # The list of experiments to enable.

924

"A String",

925

],

926

"workerZone": "A String", # The Compute Engine zone

927

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

928

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

929

# with worker_region. If neither worker_region nor worker_zone is specified,

930

# a zone in the control plane's region is chosen based on available capacity.

931

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

932

# specified in order for the job to have workers.

933

{ # Describes one particular pool of Cloud Dataflow workers to be

934

# instantiated by the Cloud Dataflow service in order to perform the

935

# computations required by a job. Note that a workflow job may use

936

# multiple pools, in order to match the various computational

937

# requirements of the various stages of the job.

938

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

939

# Compute Engine API.

940

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

941

# only be set in the Fn API path. For non-cross-language pipelines this

942

# should have only one entry. Cross-language pipelines will have two or more

943

# entries.

944

{ # Defines a SDK harness container for executing Dataflow pipelines.

945

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

946

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

947

# container instance with this image. If false (or unset) recommends using

948

# more than one core per SDK container instance with this image for

949

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

954

# will attempt to choose a reasonable default.

955

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

956

# are supported.

957

"metadata": { # Metadata to set on the Google Compute Engine VMs.

958

"a_key": "A String",

959

},

960

"diskSourceImage": "A String", # Fully qualified source image for disks.

961

"dataDisks": [ # Data disks that are used by a VM in this workflow.

962

{ # Describes the data disk used by a workflow job.

963

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

964

# attempt to choose a reasonable default.

965

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

966

# must be a disk type appropriate to the project and zone in which

967

# the workers will run. If unknown or unspecified, the service

968

# will attempt to choose a reasonable default.

969

#

970

# For example, the standard persistent disk type is a resource name

971

# typically ending in "pd-standard". If SSD persistent disks are

972

# available, the resource name typically ends with "pd-ssd". The

973

# actual valid values are defined the Google Compute Engine API,

974

# not by the Cloud Dataflow API; consult the Google Compute Engine

975

# documentation for more information about determining the set of

976

# available disk types for a particular project and zone.

977

#

978

# Google Compute Engine Disk types are local to a particular

979

# project in a particular zone, and so the resource name will

980

# typically look something like this:

981

#

982

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

983

"mountPoint": "A String", # Directory in a VM where disk is mounted.

984

},

985

],

986

"packages": [ # Packages to be installed on workers.

987

{ # The packages that must be installed in order for a worker to run the

988

# steps of the Cloud Dataflow job that will be assigned to its worker

989

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

990

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

991

# This is the mechanism by which the Cloud Dataflow SDK causes code to

992

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

993

# might use this to install jars containing the user's code and all of the

994

# various dependencies (libraries, data files, etc.) required in order

995

# for that code to run.

996

"name": "A String", # The name of the package.

997

"location": "A String", # The resource to read the package from. The supported resource type is:

998

#

999

# Google Cloud Storage:

1000

#

1001

# storage.googleapis.com/{bucket}

1002

# bucket.storage.googleapis.com/

1003

},

1004

],

1005

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1006

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1007

# `TEARDOWN_NEVER`.

1008

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1009

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1010

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1011

# down.

1012

#

1013

# If the workers are not torn down by the service, they will

1014

# continue to run and use Google Compute Engine VM resources in the

1015

# user's project until they are explicitly terminated by the user.

1016

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1017

# policy except for small, manually supervised test jobs.

1018

#

1019

# If unknown or unspecified, the service will attempt to choose a reasonable

1020

# default.

1021

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1022

# the service will use the network "default".

1023

"ipConfiguration": "A String", # Configuration for VM IPs.

1024

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1025

# attempt to choose a reasonable default.

1026

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1027

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1028

"algorithm": "A String", # The algorithm to use for autoscaling.

1029

},

1030

"poolArgs": { # Extra arguments for this worker pool.

1031

"a_key": "", # Properties of the object. Contains field @type with type URL.

1032

},

1033

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1034

# the form "regions/REGION/subnetworks/SUBNETWORK".

1035

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1036

# execute the job. If zero or unspecified, the service will

1037

# attempt to choose a reasonable default.

1038

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1039

# service will choose a number of threads (according to the number of cores

1040

# on the selected machine type for batch, or 1 by convention for streaming).

1041

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1042

# harness, residing in Google Container Registry.

1043

#

1044

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1045

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1046

# using the standard Dataflow task runner. Users should ignore

1047

# this field.

1048

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1049

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1050

# access the Cloud Dataflow API.

1051

"A String",

1052

],

1053

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1054

#

1055

# When workers access Google Cloud APIs, they logically do so via

1056

# relative URLs. If this field is specified, it supplies the base

1057

# URL to use for resolving these relative URLs. The normative

1058

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1059

# Locators".

1060

#

1061

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1062

"workflowFileName": "A String", # The file to store the workflow in.

1063

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1064

# console.

1065

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1066

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1067

# taskrunner; e.g. "root".

1068

"vmId": "A String", # The ID string of the VM.

1069

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1070

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1071

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1072

# "shuffle/v1beta1".

1073

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1074

# storage.

1075

#

1076

# The supported resource type is:

1077

#

1078

# Google Cloud Storage:

1079

#

1080

# storage.googleapis.com/{bucket}/{object}

1081

# bucket.storage.googleapis.com/{object}

1082

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1083

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1084

# "dataflow/v1b3/projects".

1085

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1086

#

1087

# When workers access Google Cloud APIs, they logically do so via

1088

# relative URLs. If this field is specified, it supplies the base

1089

# URL to use for resolving these relative URLs. The normative

1090

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1091

# Locators".

1092

#

1093

# If not specified, the default value is "http://www.googleapis.com/"

1094

"workerId": "A String", # The ID of the worker running this pipeline.

1095

},

1096

"harnessCommand": "A String", # The command to launch the worker harness.

1097

"logDir": "A String", # The directory on the VM to store logs.

1098

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1099

"languageHint": "A String", # The suggested backend language.

1100

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1101

# taskrunner; e.g. "wheel".

1102

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1103

# will not be uploaded.

1104

#

1105

# The supported resource type is:

1106

#

1107

# Google Cloud Storage:

1108

# storage.googleapis.com/{bucket}/{object}

1109

# bucket.storage.googleapis.com/{object}

1110

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1111

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1112

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1113

# temporary storage.

1114

#

1115

# The supported resource type is:

1116

#

1117

# Google Cloud Storage:

1118

# storage.googleapis.com/{bucket}/{object}

1119

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1120

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1121

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1122

# attempt to choose a reasonable default.

1123

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1124

# select a default set of packages which are useful to worker

1125

# harnesses written in a particular language.

1126

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1127

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1128

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1129

],

1130

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1131

# storage. The system will append the suffix "/temp-{JOBNAME} to

1132

# this resource prefix, where {JOBNAME} is the value of the

1133

# job_name field. The resulting bucket and object prefix is used

1134

# as the prefix of the resources used to store temporary data

1135

# needed during the job execution. NOTE: This will override the

1136

# value in taskrunner_settings.

1137

# The supported resource type is:

1138

#

1139

# Google Cloud Storage:

1140

#

1141

# storage.googleapis.com/{bucket}/{object}

1142

# bucket.storage.googleapis.com/{object}

1143

"internalExperiments": { # Experimental settings.

1144

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1145

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1146

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1147

# options are passed through the service and are used to recreate the

1148

# SDK pipeline options on the worker in a language agnostic and platform

1149

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1150

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1151

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1152

"dataset": "A String", # The dataset for the current project where various workflow

1153

# related tables are stored.

1154

#

1155

# The supported resource type is:

1156

#

1157

# Google BigQuery:

1158

# bigquery.googleapis.com/{dataset}

1159

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1160

# unspecified, the service will attempt to choose a reasonable

1161

# default. This should be in the form of the API service name,

1162

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1163

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1164

"stepsLocation": "A String", # The GCS location where the steps are stored.

1165

"steps": [ # Exactly one of step or steps_location should be specified.

1166

#

1167

# The top-level steps that constitute the entire job.

1168

{ # Defines a particular step within a Cloud Dataflow job.

1169

#

1170

# A job consists of multiple steps, each of which performs some

1171

# specific operation as part of the overall job. Data is typically

1172

# passed from one step to another as part of the job.

1173

#

1174

# Here's an example of a sequence of steps which together implement a

1175

# Map-Reduce job:

1176

#

1177

# * Read a collection of data from some source, parsing the

1178

# collection's elements.

1179

#

1180

# * Validate the elements.

1181

#

1182

# * Apply a user-defined function to map each element to some value

1183

# and extract an element-specific key value.

1184

#

1185

# * Group elements with the same key into a single element with

1186

# that key, transforming a multiply-keyed collection into a

1187

# uniquely-keyed collection.

1188

#

1189

# * Write the elements out to some data sink.

1190

#

1191

# Note that the Cloud Dataflow service may be used to run many different

1192

# types of jobs, not just Map-Reduce.

1193

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1194

"properties": { # Named properties associated with the step. Each kind of

1195

# predefined step has its own required set of properties.

1196

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1197

"a_key": "", # Properties of the object.

1198

},

1199

"name": "A String", # The name that identifies the step. This must be unique for each

1200

# step with respect to all other steps in the Cloud Dataflow job.

1201

},

1202

],

1203

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1204

# callers cannot mutate it.

1205

{ # A message describing the state of a particular execution stage.

1206

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1207

"executionStageName": "A String", # The name of the execution stage.

1208

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1209

},

1210

],

1211

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1212

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1213

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1214

# by the metadata values provided here. Populated for ListJobs and all GetJob

1215

# views SUMMARY and higher.

1216

# ListJob response and Job SUMMARY view.

1217

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1218

"sdkSupportStatus": "A String", # The support status for this SDK version.

1219

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1220

"version": "A String", # The version of the SDK used to run the job.

1221

},

1222

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1223

{ # Metadata for a BigTable connector used by the job.

1224

"instanceId": "A String", # InstanceId accessed in the connection.

1225

"tableId": "A String", # TableId accessed in the connection.

1226

"projectId": "A String", # ProjectId accessed in the connection.

1227

},

1228

],

1229

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1230

{ # Metadata for a PubSub connector used by the job.

1231

"subscription": "A String", # Subscription used in the connection.

1232

"topic": "A String", # Topic accessed in the connection.

1233

},

1234

],

1235

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1236

{ # Metadata for a BigQuery connector used by the job.

1237

"dataset": "A String", # Dataset accessed in the connection.

1238

"projectId": "A String", # Project accessed in the connection.

1239

"query": "A String", # Query used to access data in the connection.

1240

"table": "A String", # Table accessed in the connection.

1241

},

1242

],

1243

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1244

{ # Metadata for a File connector used by the job.

1245

"filePattern": "A String", # File Pattern used to access files by the connector.

1246

},

1247

],

1248

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1249

{ # Metadata for a Datastore connector used by the job.

1250

"namespace": "A String", # Namespace used in the connection.

1251

"projectId": "A String", # ProjectId accessed in the connection.

1252

},

1253

],

1254

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1255

{ # Metadata for a Spanner connector used by the job.

1256

"instanceId": "A String", # InstanceId accessed in the connection.

1257

"databaseId": "A String", # DatabaseId accessed in the connection.

1258

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

1263

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1264

# contains this job.

1265

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1266

# corresponding name prefixes of the new job.

1267

"a_key": "A String",

1268

},

1269

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1270

# Flexible resource scheduling jobs are started with some delay after job

1271

# creation, so start_time is unset before start and is updated when the

1272

# job is started by the Cloud Dataflow service. For other jobs, start_time

1273

# always equals to create_time and is immutable and set by the Cloud Dataflow

1274

# service.

1275

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1276

# If this field is set, the service will ensure its uniqueness.

1277

# The request to create a job will fail if the service has knowledge of a

1278

# previously submitted job with the same client's ID and job name.

1279

# The caller may use this field to ensure idempotence of job

1280

# creation across retried attempts to create a job.

1281

# By default, the field is empty and, in that case, the service ignores it.

1282

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1283

# isn't contained in the submitted job.

1284

"stages": { # A mapping from each stage to the information about that stage.

1285

"a_key": { # Contains information about how a particular

1286

# google.dataflow.v1beta3.Step will be executed.

1287

"stepName": [ # The steps associated with the execution stage.

1288

# Note that stages may have several steps, and that a given step

1289

# might be run by more than one stage.

1290

"A String",

1291

],

1292

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1293

},

1294

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1295

"type": "A String", # The type of Cloud Dataflow job.

1296

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1297

# Cloud Dataflow service.

1298

"tempFiles": [ # A set of files the system should be aware of that are used

1299

# for temporary storage. These temporary files will be

1300

# removed on job completion.

1301

# No duplicates are allowed.

1302

# No file patterns are supported.

1303

#

1304

# The supported files are:

1305

#

1306

# Google Cloud Storage:

1307

#

1308

# storage.googleapis.com/{bucket}/{object}

1309

# bucket.storage.googleapis.com/{object}

1310

"A String",

1311

],

1312

"id": "A String", # The unique ID of this job.

1313

#

1314

# This field is set by the Cloud Dataflow service when the Job is

1315

# created, and is immutable for the life of the job.

1316

"requestedState": "A String", # The job's requested state.

1317

#

1318

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1319

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1320

# also be used to directly set a job's requested state to

1321

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1322

# job if it has not already reached a terminal state.

1323

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1324

# of the job it replaced.

1325

#

1326

# When sending a `CreateJobRequest`, you can update a job by specifying it

1327

# here. The job named here is stopped, and its intermediate state is

1328

# transferred to this job.

1329

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1330

# snapshot.

1331

"currentState": "A String", # The current state of the job.

1332

#

1333

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1334

# specified.

1335

#

1336

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1337

# terminal state. After a job has reached a terminal state, no

1338

# further state updates may be made.

1339

#

1340

# This field may be mutated by the Cloud Dataflow service;

1341

# callers cannot mutate it.

1342

"name": "A String", # The user-specified Cloud Dataflow job name.

1343

#

1344

# Only one Job with a given name may exist in a project at any

1345

# given time. If a caller attempts to create a Job with the same

1346

# name as an already-existing Job, the attempt returns the

1347

# existing Job.

1348

#

1349

# The name must match the regular expression

1350

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1351

"currentStateTime": "A String", # The timestamp associated with the current state.

1352

}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1353

1354

location: string, The [regional endpoint]

1355

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1356

contains this job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1357

replaceJobId: string, Deprecated. This field is now in the Job message.

1358

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1359

x__xgafv: string, V1 error format.

1360

Allowed values

1361

1 - v1 error format

1362

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1363

1364

Returns:

1365

An object of the form:

1366

1367

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1368

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1369

# A description of the user pipeline and stages through which it is executed.

1370

# Created by Cloud Dataflow service. Only retrieved with

1371

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1372

# form. This data is provided by the Dataflow service for ease of visualizing

1373

# the pipeline and interpreting Dataflow provided metrics.

1374

"displayData": [ # Pipeline level display data.

1375

{ # Data provided with a pipeline or transform to provide descriptive info.

1376

"url": "A String", # An optional full URL.

1377

"javaClassValue": "A String", # Contains value if the data is of java class type.

1378

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1379

"durationValue": "A String", # Contains value if the data is of duration type.

1380

"label": "A String", # An optional label to display in a dax UI for the element.

1381

"key": "A String", # The key identifying the display data.

1382

# This is intended to be used as a label for the display data

1383

# when viewed in a dax monitoring system.

1384

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1385

# language namespace (i.e. python module) which defines the display data.

1386

# This allows a dax monitoring system to specially handle the data

1387

# and perform custom rendering.

1388

"floatValue": 3.14, # Contains value if the data is of float type.

1389

"strValue": "A String", # Contains value if the data is of string type.

1390

"int64Value": "A String", # Contains value if the data is of int64 type.

1391

"boolValue": True or False, # Contains value if the data is of a boolean type.

1392

"shortStrValue": "A String", # A possible additional shorter value to display.

1393

# For example a java_class_name_value of com.mypackage.MyDoFn

1394

# will be stored with MyDoFn as the short_str_value and

1395

# com.mypackage.MyDoFn as the java_class_name value.

1396

# short_str_value can be displayed and java_class_name_value

1397

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1398

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1399

],

1400

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1401

{ # Description of the type, names/ids, and input/outputs for a transform.

1402

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1403

"A String",

1404

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1405

"displayData": [ # Transform-specific display data.

1406

{ # Data provided with a pipeline or transform to provide descriptive info.

1407

"url": "A String", # An optional full URL.

1408

"javaClassValue": "A String", # Contains value if the data is of java class type.

1409

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1410

"durationValue": "A String", # Contains value if the data is of duration type.

1411

"label": "A String", # An optional label to display in a dax UI for the element.

1412

"key": "A String", # The key identifying the display data.

1413

# This is intended to be used as a label for the display data

1414

# when viewed in a dax monitoring system.

1415

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1416

# language namespace (i.e. python module) which defines the display data.

1417

# This allows a dax monitoring system to specially handle the data

1418

# and perform custom rendering.

1419

"floatValue": 3.14, # Contains value if the data is of float type.

1420

"strValue": "A String", # Contains value if the data is of string type.

1421

"int64Value": "A String", # Contains value if the data is of int64 type.

1422

"boolValue": True or False, # Contains value if the data is of a boolean type.

1423

"shortStrValue": "A String", # A possible additional shorter value to display.

1424

# For example a java_class_name_value of com.mypackage.MyDoFn

1425

# will be stored with MyDoFn as the short_str_value and

1426

# com.mypackage.MyDoFn as the java_class_name value.

1427

# short_str_value can be displayed and java_class_name_value

1428

# will be displayed as a tooltip.

1429

},

1430

],

1431

"id": "A String", # SDK generated id of this transform instance.

1432

"inputCollectionName": [ # User names for all collection inputs to this transform.

1433

"A String",

1434

],

1435

"name": "A String", # User provided name for this transform instance.

1436

"kind": "A String", # Type of transform.

1437

},

1438

],

1439

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1440

{ # Description of the composing transforms, names/ids, and input/outputs of a

1441

# stage of execution. Some composing transforms and sources may have been

1442

# generated by the Dataflow service during execution planning.

1443

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1444

{ # Description of an interstitial value between transforms in an execution

1445

# stage.

1446

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1447

"name": "A String", # Dataflow service generated name for this source.

1448

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1449

# source is most closely associated.

1450

},

1451

],

1452

"inputSource": [ # Input sources for this stage.

1453

{ # Description of an input or output of an execution stage.

1454

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1455

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1456

# source is most closely associated.

1457

"sizeBytes": "A String", # Size of the source, if measurable.

1458

"name": "A String", # Dataflow service generated name for this source.

1459

},

1460

],

1461

"name": "A String", # Dataflow service generated name for this stage.

1462

"componentTransform": [ # Transforms that comprise this execution stage.

1463

{ # Description of a transform executed as part of an execution stage.

1464

"name": "A String", # Dataflow service generated name for this source.

1465

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1466

"originalTransform": "A String", # User name for the original user transform with which this transform is

1467

# most closely associated.

1468

},

1469

],

1470

"id": "A String", # Dataflow service generated id for this stage.

1471

"outputSource": [ # Output sources for this stage.

1472

{ # Description of an input or output of an execution stage.

1473

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1474

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1475

# source is most closely associated.

1476

"sizeBytes": "A String", # Size of the source, if measurable.

1477

"name": "A String", # Dataflow service generated name for this source.

1478

},

1479

],

1480

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

1485

#

1486

# The labels map can contain no more than 64 entries. Entries of the labels

1487

# map are UTF8 strings that comply with the following restrictions:

1488

#

1489

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1490

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1491

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1496

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

1497

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1498

"workerRegion": "A String", # The Compute Engine region

1499

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1500

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1501

# with worker_zone. If neither worker_region nor worker_zone is specified,

1502

# default to the control plane's region.

1503

"userAgent": { # A description of the process that generated the request.

1504

"a_key": "", # Properties of the object.

1505

},

1506

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

1507

"version": { # A structure describing which components and their versions of the service

1508

# are required in order to run the job.

1509

"a_key": "", # Properties of the object.

1510

},

1511

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1512

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1513

#

1514

# Format:

1515

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1516

"experiments": [ # The list of experiments to enable.

1517

"A String",

1518

],

1519

"workerZone": "A String", # The Compute Engine zone

1520

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1521

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1522

# with worker_region. If neither worker_region nor worker_zone is specified,

1523

# a zone in the control plane's region is chosen based on available capacity.

1524

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1525

# specified in order for the job to have workers.

1526

{ # Describes one particular pool of Cloud Dataflow workers to be

1527

# instantiated by the Cloud Dataflow service in order to perform the

1528

# computations required by a job. Note that a workflow job may use

1529

# multiple pools, in order to match the various computational

1530

# requirements of the various stages of the job.

1531

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1532

# Compute Engine API.

1533

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1534

# only be set in the Fn API path. For non-cross-language pipelines this

1535

# should have only one entry. Cross-language pipelines will have two or more

1536

# entries.

1537

{ # Defines a SDK harness container for executing Dataflow pipelines.

1538

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1539

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1540

# container instance with this image. If false (or unset) recommends using

1541

# more than one core per SDK container instance with this image for

1542

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1547

# will attempt to choose a reasonable default.

1548

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1549

# are supported.

1550

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1551

"a_key": "A String",

1552

},

1553

"diskSourceImage": "A String", # Fully qualified source image for disks.

1554

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1555

{ # Describes the data disk used by a workflow job.

1556

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1557

# attempt to choose a reasonable default.

1558

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1559

# must be a disk type appropriate to the project and zone in which

1560

# the workers will run. If unknown or unspecified, the service

1561

# will attempt to choose a reasonable default.

1562

#

1563

# For example, the standard persistent disk type is a resource name

1564

# typically ending in "pd-standard". If SSD persistent disks are

1565

# available, the resource name typically ends with "pd-ssd". The

1566

# actual valid values are defined the Google Compute Engine API,

1567

# not by the Cloud Dataflow API; consult the Google Compute Engine

1568

# documentation for more information about determining the set of

1569

# available disk types for a particular project and zone.

1570

#

1571

# Google Compute Engine Disk types are local to a particular

1572

# project in a particular zone, and so the resource name will

1573

# typically look something like this:

1574

#

1575

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1576

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1577

},

1578

],

1579

"packages": [ # Packages to be installed on workers.

1580

{ # The packages that must be installed in order for a worker to run the

1581

# steps of the Cloud Dataflow job that will be assigned to its worker

1582

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1583

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1584

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1585

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1586

# might use this to install jars containing the user's code and all of the

1587

# various dependencies (libraries, data files, etc.) required in order

1588

# for that code to run.

1589

"name": "A String", # The name of the package.

1590

"location": "A String", # The resource to read the package from. The supported resource type is:

1591

#

1592

# Google Cloud Storage:

1593

#

1594

# storage.googleapis.com/{bucket}

1595

# bucket.storage.googleapis.com/

1596

},

1597

],

1598

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1599

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1600

# `TEARDOWN_NEVER`.

1601

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1602

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1603

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1604

# down.

1605

#

1606

# If the workers are not torn down by the service, they will

1607

# continue to run and use Google Compute Engine VM resources in the

1608

# user's project until they are explicitly terminated by the user.

1609

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1610

# policy except for small, manually supervised test jobs.

1611

#

1612

# If unknown or unspecified, the service will attempt to choose a reasonable

1613

# default.

1614

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1615

# the service will use the network "default".

1616

"ipConfiguration": "A String", # Configuration for VM IPs.

1617

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1618

# attempt to choose a reasonable default.

1619

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1620

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1621

"algorithm": "A String", # The algorithm to use for autoscaling.

1622

},

1623

"poolArgs": { # Extra arguments for this worker pool.

1624

"a_key": "", # Properties of the object. Contains field @type with type URL.

1625

},

1626

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1627

# the form "regions/REGION/subnetworks/SUBNETWORK".

1628

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1629

# execute the job. If zero or unspecified, the service will

1630

# attempt to choose a reasonable default.

1631

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1632

# service will choose a number of threads (according to the number of cores

1633

# on the selected machine type for batch, or 1 by convention for streaming).

1634

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1635

# harness, residing in Google Container Registry.

1636

#

1637

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1638

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1639

# using the standard Dataflow task runner. Users should ignore

1640

# this field.

1641

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1642

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1643

# access the Cloud Dataflow API.

1644

"A String",

1645

],

1646

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1647

#

1648

# When workers access Google Cloud APIs, they logically do so via

1649

# relative URLs. If this field is specified, it supplies the base

1650

# URL to use for resolving these relative URLs. The normative

1651

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1652

# Locators".

1653

#

1654

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1655

"workflowFileName": "A String", # The file to store the workflow in.

1656

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1657

# console.

1658

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1659

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1660

# taskrunner; e.g. "root".

1661

"vmId": "A String", # The ID string of the VM.

1662

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1663

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1664

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1665

# "shuffle/v1beta1".

1666

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1667

# storage.

1668

#

1669

# The supported resource type is:

1670

#

1671

# Google Cloud Storage:

1672

#

1673

# storage.googleapis.com/{bucket}/{object}

1674

# bucket.storage.googleapis.com/{object}

1675

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1676

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1677

# "dataflow/v1b3/projects".

1678

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1679

#

1680

# When workers access Google Cloud APIs, they logically do so via

1681

# relative URLs. If this field is specified, it supplies the base

1682

# URL to use for resolving these relative URLs. The normative

1683

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1684

# Locators".

1685

#

1686

# If not specified, the default value is "http://www.googleapis.com/"

1687

"workerId": "A String", # The ID of the worker running this pipeline.

1688

},

1689

"harnessCommand": "A String", # The command to launch the worker harness.

1690

"logDir": "A String", # The directory on the VM to store logs.

1691

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1692

"languageHint": "A String", # The suggested backend language.

1693

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1694

# taskrunner; e.g. "wheel".

1695

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1696

# will not be uploaded.

1697

#

1698

# The supported resource type is:

1699

#

1700

# Google Cloud Storage:

1701

# storage.googleapis.com/{bucket}/{object}

1702

# bucket.storage.googleapis.com/{object}

1703

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1704

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1705

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1706

# temporary storage.

1707

#

1708

# The supported resource type is:

1709

#

1710

# Google Cloud Storage:

1711

# storage.googleapis.com/{bucket}/{object}

1712

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1713

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1714

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1715

# attempt to choose a reasonable default.

1716

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1717

# select a default set of packages which are useful to worker

1718

# harnesses written in a particular language.

1719

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1720

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1721

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1722

],

1723

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1724

# storage. The system will append the suffix "/temp-{JOBNAME} to

1725

# this resource prefix, where {JOBNAME} is the value of the

1726

# job_name field. The resulting bucket and object prefix is used

1727

# as the prefix of the resources used to store temporary data

1728

# needed during the job execution. NOTE: This will override the

1729

# value in taskrunner_settings.

1730

# The supported resource type is:

1731

#

1732

# Google Cloud Storage:

1733

#

1734

# storage.googleapis.com/{bucket}/{object}

1735

# bucket.storage.googleapis.com/{object}

1736

"internalExperiments": { # Experimental settings.

1737

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1738

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1739

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1740

# options are passed through the service and are used to recreate the

1741

# SDK pipeline options on the worker in a language agnostic and platform

1742

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1743

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1744

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1745

"dataset": "A String", # The dataset for the current project where various workflow

1746

# related tables are stored.

1747

#

1748

# The supported resource type is:

1749

#

1750

# Google BigQuery:

1751

# bigquery.googleapis.com/{dataset}

1752

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1753

# unspecified, the service will attempt to choose a reasonable

1754

# default. This should be in the form of the API service name,

1755

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1756

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1757

"stepsLocation": "A String", # The GCS location where the steps are stored.

1758

"steps": [ # Exactly one of step or steps_location should be specified.

1759

#

1760

# The top-level steps that constitute the entire job.

1761

{ # Defines a particular step within a Cloud Dataflow job.

1762

#

1763

# A job consists of multiple steps, each of which performs some

1764

# specific operation as part of the overall job. Data is typically

1765

# passed from one step to another as part of the job.

1766

#

1767

# Here's an example of a sequence of steps which together implement a

1768

# Map-Reduce job:

1769

#

1770

# * Read a collection of data from some source, parsing the

1771

# collection's elements.

1772

#

1773

# * Validate the elements.

1774

#

1775

# * Apply a user-defined function to map each element to some value

1776

# and extract an element-specific key value.

1777

#

1778

# * Group elements with the same key into a single element with

1779

# that key, transforming a multiply-keyed collection into a

1780

# uniquely-keyed collection.

1781

#

1782

# * Write the elements out to some data sink.

1783

#

1784

# Note that the Cloud Dataflow service may be used to run many different

1785

# types of jobs, not just Map-Reduce.

1786

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1787

"properties": { # Named properties associated with the step. Each kind of

1788

# predefined step has its own required set of properties.

1789

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1790

"a_key": "", # Properties of the object.

1791

},

1792

"name": "A String", # The name that identifies the step. This must be unique for each

1793

# step with respect to all other steps in the Cloud Dataflow job.

1794

},

1795

],

1796

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1797

# callers cannot mutate it.

1798

{ # A message describing the state of a particular execution stage.

1799

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1800

"executionStageName": "A String", # The name of the execution stage.

1801

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1802

},

1803

],

1804

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1805

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1806

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1807

# by the metadata values provided here. Populated for ListJobs and all GetJob

1808

# views SUMMARY and higher.

1809

# ListJob response and Job SUMMARY view.

1810

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1811

"sdkSupportStatus": "A String", # The support status for this SDK version.

1812

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1813

"version": "A String", # The version of the SDK used to run the job.

1814

},

1815

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1816

{ # Metadata for a BigTable connector used by the job.

1817

"instanceId": "A String", # InstanceId accessed in the connection.

1818

"tableId": "A String", # TableId accessed in the connection.

1819

"projectId": "A String", # ProjectId accessed in the connection.

1820

},

1821

],

1822

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1823

{ # Metadata for a PubSub connector used by the job.

1824

"subscription": "A String", # Subscription used in the connection.

1825

"topic": "A String", # Topic accessed in the connection.

1826

},

1827

],

1828

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1829

{ # Metadata for a BigQuery connector used by the job.

1830

"dataset": "A String", # Dataset accessed in the connection.

1831

"projectId": "A String", # Project accessed in the connection.

1832

"query": "A String", # Query used to access data in the connection.

1833

"table": "A String", # Table accessed in the connection.

1834

},

1835

],

1836

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1837

{ # Metadata for a File connector used by the job.

1838

"filePattern": "A String", # File Pattern used to access files by the connector.

1839

},

1840

],

1841

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1842

{ # Metadata for a Datastore connector used by the job.

1843

"namespace": "A String", # Namespace used in the connection.

1844

"projectId": "A String", # ProjectId accessed in the connection.

1845

},

1846

],

1847

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1848

{ # Metadata for a Spanner connector used by the job.

1849

"instanceId": "A String", # InstanceId accessed in the connection.

1850

"databaseId": "A String", # DatabaseId accessed in the connection.

1851

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

1856

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1857

# contains this job.

1858

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1859

# corresponding name prefixes of the new job.

1860

"a_key": "A String",

1861

},

1862

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1863

# Flexible resource scheduling jobs are started with some delay after job

1864

# creation, so start_time is unset before start and is updated when the

1865

# job is started by the Cloud Dataflow service. For other jobs, start_time

1866

# always equals to create_time and is immutable and set by the Cloud Dataflow

1867

# service.

1868

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1869

# If this field is set, the service will ensure its uniqueness.

1870

# The request to create a job will fail if the service has knowledge of a

1871

# previously submitted job with the same client's ID and job name.

1872

# The caller may use this field to ensure idempotence of job

1873

# creation across retried attempts to create a job.

1874

# By default, the field is empty and, in that case, the service ignores it.

1875

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1876

# isn't contained in the submitted job.

1877

"stages": { # A mapping from each stage to the information about that stage.

1878

"a_key": { # Contains information about how a particular

1879

# google.dataflow.v1beta3.Step will be executed.

1880

"stepName": [ # The steps associated with the execution stage.

1881

# Note that stages may have several steps, and that a given step

1882

# might be run by more than one stage.

1883

"A String",

1884

],

1885

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1886

},

1887

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1888

"type": "A String", # The type of Cloud Dataflow job.

1889

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1890

# Cloud Dataflow service.

1891

"tempFiles": [ # A set of files the system should be aware of that are used

1892

# for temporary storage. These temporary files will be

1893

# removed on job completion.

1894

# No duplicates are allowed.

1895

# No file patterns are supported.

1896

#

1897

# The supported files are:

1898

#

1899

# Google Cloud Storage:

1900

#

1901

# storage.googleapis.com/{bucket}/{object}

1902

# bucket.storage.googleapis.com/{object}

1903

"A String",

1904

],

1905

"id": "A String", # The unique ID of this job.

1906

#

1907

# This field is set by the Cloud Dataflow service when the Job is

1908

# created, and is immutable for the life of the job.

1909

"requestedState": "A String", # The job's requested state.

1910

#

1911

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1912

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1913

# also be used to directly set a job's requested state to

1914

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1915

# job if it has not already reached a terminal state.

1916

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1917

# of the job it replaced.

1918

#

1919

# When sending a `CreateJobRequest`, you can update a job by specifying it

1920

# here. The job named here is stopped, and its intermediate state is

1921

# transferred to this job.

1922

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1923

# snapshot.

1924

"currentState": "A String", # The current state of the job.

1925

#

1926

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1927

# specified.

1928

#

1929

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1930

# terminal state. After a job has reached a terminal state, no

1931

# further state updates may be made.

1932

#

1933

# This field may be mutated by the Cloud Dataflow service;

1934

# callers cannot mutate it.

1935

"name": "A String", # The user-specified Cloud Dataflow job name.

1936

#

1937

# Only one Job with a given name may exist in a project at any

1938

# given time. If a caller attempts to create a Job with the same

1939

# name as an already-existing Job, the attempt returns the

1940

# existing Job.

1941

#

1942

# The name must match the regular expression

1943

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1944

"currentStateTime": "A String", # The timestamp associated with the current state.

1945

}</pre>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

</div>

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1949

<code class="details" id="get">get(projectId, jobId, location=None, view=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1950

<pre>Gets the state of the specified Cloud Dataflow job.

1951

1952

To get the state of a job, we recommend using `projects.locations.jobs.get`

1953

with a [regional endpoint]

1954

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1955

`projects.jobs.get` is not recommended, as you can only get the state of

1956

jobs that are running in `us-central1`.

1957

1958

Args:

1959

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1960

jobId: string, The job ID. (required)

1961

location: string, The [regional endpoint]

1962

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1963

contains this job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1964

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1965

x__xgafv: string, V1 error format.

1966

Allowed values

1967

1 - v1 error format

1968

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1969

1970

Returns:

1971

An object of the form:

1972

1973

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1974

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1975

# A description of the user pipeline and stages through which it is executed.

1976

# Created by Cloud Dataflow service. Only retrieved with

1977

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1978

# form. This data is provided by the Dataflow service for ease of visualizing

1979

# the pipeline and interpreting Dataflow provided metrics.

1980

"displayData": [ # Pipeline level display data.

1981

{ # Data provided with a pipeline or transform to provide descriptive info.

1982

"url": "A String", # An optional full URL.

1983

"javaClassValue": "A String", # Contains value if the data is of java class type.

1984

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1985

"durationValue": "A String", # Contains value if the data is of duration type.

1986

"label": "A String", # An optional label to display in a dax UI for the element.

1987

"key": "A String", # The key identifying the display data.

1988

# This is intended to be used as a label for the display data

1989

# when viewed in a dax monitoring system.

1990

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1991

# language namespace (i.e. python module) which defines the display data.

1992

# This allows a dax monitoring system to specially handle the data

1993

# and perform custom rendering.

1994

"floatValue": 3.14, # Contains value if the data is of float type.

1995

"strValue": "A String", # Contains value if the data is of string type.

1996

"int64Value": "A String", # Contains value if the data is of int64 type.

1997

"boolValue": True or False, # Contains value if the data is of a boolean type.

1998

"shortStrValue": "A String", # A possible additional shorter value to display.

1999

# For example a java_class_name_value of com.mypackage.MyDoFn

2000

# will be stored with MyDoFn as the short_str_value and

2001

# com.mypackage.MyDoFn as the java_class_name value.

2002

# short_str_value can be displayed and java_class_name_value

2003

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2004

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2005

],

2006

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2007

{ # Description of the type, names/ids, and input/outputs for a transform.

2008

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2009

"A String",

2010

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2011

"displayData": [ # Transform-specific display data.

2012

{ # Data provided with a pipeline or transform to provide descriptive info.

2013

"url": "A String", # An optional full URL.

2014

"javaClassValue": "A String", # Contains value if the data is of java class type.

2015

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2016

"durationValue": "A String", # Contains value if the data is of duration type.

2017

"label": "A String", # An optional label to display in a dax UI for the element.

2018

"key": "A String", # The key identifying the display data.

2019

# This is intended to be used as a label for the display data

2020

# when viewed in a dax monitoring system.

2021

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2022

# language namespace (i.e. python module) which defines the display data.

2023

# This allows a dax monitoring system to specially handle the data

2024

# and perform custom rendering.

2025

"floatValue": 3.14, # Contains value if the data is of float type.

2026

"strValue": "A String", # Contains value if the data is of string type.

2027

"int64Value": "A String", # Contains value if the data is of int64 type.

2028

"boolValue": True or False, # Contains value if the data is of a boolean type.

2029

"shortStrValue": "A String", # A possible additional shorter value to display.

2030

# For example a java_class_name_value of com.mypackage.MyDoFn

2031

# will be stored with MyDoFn as the short_str_value and

2032

# com.mypackage.MyDoFn as the java_class_name value.

2033

# short_str_value can be displayed and java_class_name_value

2034

# will be displayed as a tooltip.

2035

},

2036

],

2037

"id": "A String", # SDK generated id of this transform instance.

2038

"inputCollectionName": [ # User names for all collection inputs to this transform.

2039

"A String",

2040

],

2041

"name": "A String", # User provided name for this transform instance.

2042

"kind": "A String", # Type of transform.

2043

},

2044

],

2045

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2046

{ # Description of the composing transforms, names/ids, and input/outputs of a

2047

# stage of execution. Some composing transforms and sources may have been

2048

# generated by the Dataflow service during execution planning.

2049

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2050

{ # Description of an interstitial value between transforms in an execution

2051

# stage.

2052

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2053

"name": "A String", # Dataflow service generated name for this source.

2054

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2055

# source is most closely associated.

2056

},

2057

],

2058

"inputSource": [ # Input sources for this stage.

2059

{ # Description of an input or output of an execution stage.

2060

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2061

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2062

# source is most closely associated.

2063

"sizeBytes": "A String", # Size of the source, if measurable.

2064

"name": "A String", # Dataflow service generated name for this source.

2065

},

2066

],

2067

"name": "A String", # Dataflow service generated name for this stage.

2068

"componentTransform": [ # Transforms that comprise this execution stage.

2069

{ # Description of a transform executed as part of an execution stage.

2070

"name": "A String", # Dataflow service generated name for this source.

2071

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2072

"originalTransform": "A String", # User name for the original user transform with which this transform is

2073

# most closely associated.

2074

},

2075

],

2076

"id": "A String", # Dataflow service generated id for this stage.

2077

"outputSource": [ # Output sources for this stage.

2078

{ # Description of an input or output of an execution stage.

2079

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2080

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2081

# source is most closely associated.

2082

"sizeBytes": "A String", # Size of the source, if measurable.

2083

"name": "A String", # Dataflow service generated name for this source.

2084

},

2085

],

2086

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

2091

#

2092

# The labels map can contain no more than 64 entries. Entries of the labels

2093

# map are UTF8 strings that comply with the following restrictions:

2094

#

2095

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2096

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2097

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2102

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

2103

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2104

"workerRegion": "A String", # The Compute Engine region

2105

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2106

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2107

# with worker_zone. If neither worker_region nor worker_zone is specified,

2108

# default to the control plane's region.

2109

"userAgent": { # A description of the process that generated the request.

2110

"a_key": "", # Properties of the object.

2111

},

2112

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2113

"version": { # A structure describing which components and their versions of the service

2114

# are required in order to run the job.

2115

"a_key": "", # Properties of the object.

2116

},

2117

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2118

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2119

#

2120

# Format:

2121

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2122

"experiments": [ # The list of experiments to enable.

2123

"A String",

2124

],

2125

"workerZone": "A String", # The Compute Engine zone

2126

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2127

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2128

# with worker_region. If neither worker_region nor worker_zone is specified,

2129

# a zone in the control plane's region is chosen based on available capacity.

2130

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2131

# specified in order for the job to have workers.

2132

{ # Describes one particular pool of Cloud Dataflow workers to be

2133

# instantiated by the Cloud Dataflow service in order to perform the

2134

# computations required by a job. Note that a workflow job may use

2135

# multiple pools, in order to match the various computational

2136

# requirements of the various stages of the job.

2137

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2138

# Compute Engine API.

2139

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2140

# only be set in the Fn API path. For non-cross-language pipelines this

2141

# should have only one entry. Cross-language pipelines will have two or more

2142

# entries.

2143

{ # Defines a SDK harness container for executing Dataflow pipelines.

2144

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2145

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2146

# container instance with this image. If false (or unset) recommends using

2147

# more than one core per SDK container instance with this image for

2148

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2153

# will attempt to choose a reasonable default.

2154

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2155

# are supported.

2156

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2157

"a_key": "A String",

2158

},

2159

"diskSourceImage": "A String", # Fully qualified source image for disks.

2160

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2161

{ # Describes the data disk used by a workflow job.

2162

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2163

# attempt to choose a reasonable default.

2164

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2165

# must be a disk type appropriate to the project and zone in which

2166

# the workers will run. If unknown or unspecified, the service

2167

# will attempt to choose a reasonable default.

2168

#

2169

# For example, the standard persistent disk type is a resource name

2170

# typically ending in "pd-standard". If SSD persistent disks are

2171

# available, the resource name typically ends with "pd-ssd". The

2172

# actual valid values are defined the Google Compute Engine API,

2173

# not by the Cloud Dataflow API; consult the Google Compute Engine

2174

# documentation for more information about determining the set of

2175

# available disk types for a particular project and zone.

2176

#

2177

# Google Compute Engine Disk types are local to a particular

2178

# project in a particular zone, and so the resource name will

2179

# typically look something like this:

2180

#

2181

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2182

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2183

},

2184

],

2185

"packages": [ # Packages to be installed on workers.

2186

{ # The packages that must be installed in order for a worker to run the

2187

# steps of the Cloud Dataflow job that will be assigned to its worker

2188

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2189

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2190

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2191

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2192

# might use this to install jars containing the user's code and all of the

2193

# various dependencies (libraries, data files, etc.) required in order

2194

# for that code to run.

2195

"name": "A String", # The name of the package.

2196

"location": "A String", # The resource to read the package from. The supported resource type is:

2197

#

2198

# Google Cloud Storage:

2199

#

2200

# storage.googleapis.com/{bucket}

2201

# bucket.storage.googleapis.com/

2202

},

2203

],

2204

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2205

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2206

# `TEARDOWN_NEVER`.

2207

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2208

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2209

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2210

# down.

2211

#

2212

# If the workers are not torn down by the service, they will

2213

# continue to run and use Google Compute Engine VM resources in the

2214

# user's project until they are explicitly terminated by the user.

2215

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2216

# policy except for small, manually supervised test jobs.

2217

#

2218

# If unknown or unspecified, the service will attempt to choose a reasonable

2219

# default.

2220

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2221

# the service will use the network "default".

2222

"ipConfiguration": "A String", # Configuration for VM IPs.

2223

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2224

# attempt to choose a reasonable default.

2225

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2226

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2227

"algorithm": "A String", # The algorithm to use for autoscaling.

2228

},

2229

"poolArgs": { # Extra arguments for this worker pool.

2230

"a_key": "", # Properties of the object. Contains field @type with type URL.

2231

},

2232

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2233

# the form "regions/REGION/subnetworks/SUBNETWORK".

2234

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2235

# execute the job. If zero or unspecified, the service will

2236

# attempt to choose a reasonable default.

2237

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2238

# service will choose a number of threads (according to the number of cores

2239

# on the selected machine type for batch, or 1 by convention for streaming).

2240

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2241

# harness, residing in Google Container Registry.

2242

#

2243

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2244

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2245

# using the standard Dataflow task runner. Users should ignore

2246

# this field.

2247

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2248

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2249

# access the Cloud Dataflow API.

2250

"A String",

2251

],

2252

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2253

#

2254

# When workers access Google Cloud APIs, they logically do so via

2255

# relative URLs. If this field is specified, it supplies the base

2256

# URL to use for resolving these relative URLs. The normative

2257

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2258

# Locators".

2259

#

2260

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2261

"workflowFileName": "A String", # The file to store the workflow in.

2262

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2263

# console.

2264

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2265

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2266

# taskrunner; e.g. "root".

2267

"vmId": "A String", # The ID string of the VM.

2268

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2269

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2270

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2271

# "shuffle/v1beta1".

2272

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2273

# storage.

2274

#

2275

# The supported resource type is:

2276

#

2277

# Google Cloud Storage:

2278

#

2279

# storage.googleapis.com/{bucket}/{object}

2280

# bucket.storage.googleapis.com/{object}

2281

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2282

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2283

# "dataflow/v1b3/projects".

2284

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2285

#

2286

# When workers access Google Cloud APIs, they logically do so via

2287

# relative URLs. If this field is specified, it supplies the base

2288

# URL to use for resolving these relative URLs. The normative

2289

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2290

# Locators".

2291

#

2292

# If not specified, the default value is "http://www.googleapis.com/"

2293

"workerId": "A String", # The ID of the worker running this pipeline.

2294

},

2295

"harnessCommand": "A String", # The command to launch the worker harness.

2296

"logDir": "A String", # The directory on the VM to store logs.

2297

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2298

"languageHint": "A String", # The suggested backend language.

2299

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2300

# taskrunner; e.g. "wheel".

2301

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2302

# will not be uploaded.

2303

#

2304

# The supported resource type is:

2305

#

2306

# Google Cloud Storage:

2307

# storage.googleapis.com/{bucket}/{object}

2308

# bucket.storage.googleapis.com/{object}

2309

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2310

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2311

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2312

# temporary storage.

2313

#

2314

# The supported resource type is:

2315

#

2316

# Google Cloud Storage:

2317

# storage.googleapis.com/{bucket}/{object}

2318

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2319

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2320

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

2321

# attempt to choose a reasonable default.

2322

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2323

# select a default set of packages which are useful to worker

2324

# harnesses written in a particular language.

2325

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2326

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2327

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2328

],

2329

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2330

# storage. The system will append the suffix "/temp-{JOBNAME} to

2331

# this resource prefix, where {JOBNAME} is the value of the

2332

# job_name field. The resulting bucket and object prefix is used

2333

# as the prefix of the resources used to store temporary data

2334

# needed during the job execution. NOTE: This will override the

2335

# value in taskrunner_settings.

2336

# The supported resource type is:

2337

#

2338

# Google Cloud Storage:

2339

#

2340

# storage.googleapis.com/{bucket}/{object}

2341

# bucket.storage.googleapis.com/{object}

2342

"internalExperiments": { # Experimental settings.

2343

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2344

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2345

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2346

# options are passed through the service and are used to recreate the

2347

# SDK pipeline options on the worker in a language agnostic and platform

2348

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2349

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2350

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2351

"dataset": "A String", # The dataset for the current project where various workflow

2352

# related tables are stored.

2353

#

2354

# The supported resource type is:

2355

#

2356

# Google BigQuery:

2357

# bigquery.googleapis.com/{dataset}

2358

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

2359

# unspecified, the service will attempt to choose a reasonable

2360

# default. This should be in the form of the API service name,

2361

# e.g. "compute.googleapis.com".

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2362

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2363

"stepsLocation": "A String", # The GCS location where the steps are stored.

2364

"steps": [ # Exactly one of step or steps_location should be specified.

2365

#

2366

# The top-level steps that constitute the entire job.

2367

{ # Defines a particular step within a Cloud Dataflow job.

2368

#

2369

# A job consists of multiple steps, each of which performs some

2370

# specific operation as part of the overall job. Data is typically

2371

# passed from one step to another as part of the job.

2372

#

2373

# Here's an example of a sequence of steps which together implement a

2374

# Map-Reduce job:

2375

#

2376

# * Read a collection of data from some source, parsing the

2377

# collection's elements.

2378

#

2379

# * Validate the elements.

2380

#

2381

# * Apply a user-defined function to map each element to some value

2382

# and extract an element-specific key value.

2383

#

2384

# * Group elements with the same key into a single element with

2385

# that key, transforming a multiply-keyed collection into a

2386

# uniquely-keyed collection.

2387

#

2388

# * Write the elements out to some data sink.

2389

#

2390

# Note that the Cloud Dataflow service may be used to run many different

2391

# types of jobs, not just Map-Reduce.

2392

"kind": "A String", # The kind of step in the Cloud Dataflow job.

2393

"properties": { # Named properties associated with the step. Each kind of

2394

# predefined step has its own required set of properties.

2395

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

2396

"a_key": "", # Properties of the object.

2397

},

2398

"name": "A String", # The name that identifies the step. This must be unique for each

2399

# step with respect to all other steps in the Cloud Dataflow job.

2400

},

2401

],

2402

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2403

# callers cannot mutate it.

2404

{ # A message describing the state of a particular execution stage.

2405

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2406

"executionStageName": "A String", # The name of the execution stage.

2407

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2408

},

2409

],

2410

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2411

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2412

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2413

# by the metadata values provided here. Populated for ListJobs and all GetJob

2414

# views SUMMARY and higher.

2415

# ListJob response and Job SUMMARY view.

2416

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2417

"sdkSupportStatus": "A String", # The support status for this SDK version.

2418

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2419

"version": "A String", # The version of the SDK used to run the job.

2420

},

2421

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2422

{ # Metadata for a BigTable connector used by the job.

2423

"instanceId": "A String", # InstanceId accessed in the connection.

2424

"tableId": "A String", # TableId accessed in the connection.

2425

"projectId": "A String", # ProjectId accessed in the connection.

2426

},

2427

],

2428

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2429

{ # Metadata for a PubSub connector used by the job.

2430

"subscription": "A String", # Subscription used in the connection.

2431

"topic": "A String", # Topic accessed in the connection.

2432

},

2433

],

2434

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2435

{ # Metadata for a BigQuery connector used by the job.

2436

"dataset": "A String", # Dataset accessed in the connection.

2437

"projectId": "A String", # Project accessed in the connection.

2438

"query": "A String", # Query used to access data in the connection.

2439

"table": "A String", # Table accessed in the connection.

2440

},

2441

],

2442

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2443

{ # Metadata for a File connector used by the job.

2444

"filePattern": "A String", # File Pattern used to access files by the connector.

2445

},

2446

],

2447

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2448

{ # Metadata for a Datastore connector used by the job.

2449

"namespace": "A String", # Namespace used in the connection.

2450

"projectId": "A String", # ProjectId accessed in the connection.

2451

},

2452

],

2453

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2454

{ # Metadata for a Spanner connector used by the job.

2455

"instanceId": "A String", # InstanceId accessed in the connection.

2456

"databaseId": "A String", # DatabaseId accessed in the connection.

2457

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

2462

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2463

# contains this job.

2464

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

2465

# corresponding name prefixes of the new job.

2466

"a_key": "A String",

2467

},

2468

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2469

# Flexible resource scheduling jobs are started with some delay after job

2470

# creation, so start_time is unset before start and is updated when the

2471

# job is started by the Cloud Dataflow service. For other jobs, start_time

2472

# always equals to create_time and is immutable and set by the Cloud Dataflow

2473

# service.

2474

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2475

# If this field is set, the service will ensure its uniqueness.

2476

# The request to create a job will fail if the service has knowledge of a

2477

# previously submitted job with the same client's ID and job name.

2478

# The caller may use this field to ensure idempotence of job

2479

# creation across retried attempts to create a job.

2480

# By default, the field is empty and, in that case, the service ignores it.

2481

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2482

# isn't contained in the submitted job.

2483

"stages": { # A mapping from each stage to the information about that stage.

2484

"a_key": { # Contains information about how a particular

2485

# google.dataflow.v1beta3.Step will be executed.

2486

"stepName": [ # The steps associated with the execution stage.

2487

# Note that stages may have several steps, and that a given step

2488

# might be run by more than one stage.

2489

"A String",

2490

],

2491

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2492

},

2493

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2494

"type": "A String", # The type of Cloud Dataflow job.

2495

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2496

# Cloud Dataflow service.

2497

"tempFiles": [ # A set of files the system should be aware of that are used

2498

# for temporary storage. These temporary files will be

2499

# removed on job completion.

2500

# No duplicates are allowed.

2501

# No file patterns are supported.

2502

#

2503

# The supported files are:

2504

#

2505

# Google Cloud Storage:

2506

#

2507

# storage.googleapis.com/{bucket}/{object}

2508

# bucket.storage.googleapis.com/{object}

2509

"A String",

2510

],

2511

"id": "A String", # The unique ID of this job.

2512

#

2513

# This field is set by the Cloud Dataflow service when the Job is

2514

# created, and is immutable for the life of the job.

2515

"requestedState": "A String", # The job's requested state.

2516

#

2517

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2518

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2519

# also be used to directly set a job's requested state to

2520

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2521

# job if it has not already reached a terminal state.

2522

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2523

# of the job it replaced.

2524

#

2525

# When sending a `CreateJobRequest`, you can update a job by specifying it

2526

# here. The job named here is stopped, and its intermediate state is

2527

# transferred to this job.

2528

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2529

# snapshot.

2530

"currentState": "A String", # The current state of the job.

2531

#

2532

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2533

# specified.

2534

#

2535

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2536

# terminal state. After a job has reached a terminal state, no

2537

# further state updates may be made.

2538

#

2539

# This field may be mutated by the Cloud Dataflow service;

2540

# callers cannot mutate it.

2541

"name": "A String", # The user-specified Cloud Dataflow job name.

2542

#

2543

# Only one Job with a given name may exist in a project at any

2544

# given time. If a caller attempts to create a Job with the same

2545

# name as an already-existing Job, the attempt returns the

2546

# existing Job.

2547

#

2548

# The name must match the regular expression

2549

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

2550

"currentStateTime": "A String", # The timestamp associated with the current state.

2551

}</pre>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

</div>

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2555

<code class="details" id="getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</code>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2556

<pre>Request the job status.

2557

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2558

To request the status of a job, we recommend using

2559

`projects.locations.jobs.getMetrics` with a [regional endpoint]

2560

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2561

`projects.jobs.getMetrics` is not recommended, as you can only request the

2562

status of jobs that are running in `us-central1`.

2563

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2564

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2565

projectId: string, A project id. (required)

2566

jobId: string, The job to get messages for. (required)

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2567

startTime: string, Return only metric data that has changed since this time.

2568

Default is to return all information about all metrics for the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2569

location: string, The [regional endpoint]

2570

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2571

contains the job specified by job_id.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2572

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2573

Allowed values

2574

1 - v1 error format

2575

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2576

2577

Returns:

2578

An object of the form:

2579

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2580

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2581

# of a Dataflow job. Metrics correspond to user-defined and system-defined

2582

# metrics in the job.

2583

#

2584

# This resource captures only the most recent values of each metric;

2585

# time-series data can be queried for them (under the same metric names)

2586

# from Cloud Monitoring.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2587

"metricTime": "A String", # Timestamp as of which metric values are current.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2588

"metrics": [ # All metrics for this job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2589

{ # Describes the state of a metric.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2590

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2591

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

2592

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2593

# The specified aggregation kind is case-insensitive.

2594

#

2595

# If omitted, this is not an aggregated value but instead

2596

# a single metric sample value.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2597

"gauge": "", # A struct value describing properties of a Gauge.

2598

# Metrics of gauge type show the value of a metric across time, and is

2599

# aggregated based on the newest value.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2600

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2601

# reporting work progress; it will be filled in responses from the

2602

# metrics API.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2603

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

2604

# "And", and "Or". The possible value types are Long, Double, and Boolean.

2605

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

2606

# value accumulated since the worker started working on this WorkItem.

2607

# By default this is false, indicating that this metric is reported

2608

# as a delta that is not associated with any WorkItem.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2609

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

2610

# metric.

2611

"context": { # Zero or more labeled fields which identify the part of the job this

2612

# metric is associated with, such as the name of a step or collection.

2613

#

2614

# For example, built-in counters associated with steps will have

2615

# context['step'] = <step-name>. Counters associated with PCollections

2616

# in the SDK will have context['pcollection'] = <pcollection-name>.

2617

"a_key": "A String",

2618

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2619

"name": "A String", # Worker-defined metric name.

2620

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

2621

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2622

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2623

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

2624

# This holds the count of the aggregated values and is used in combination

2625

# with mean_sum above to obtain the actual mean aggregate value.

2626

# The only possible value type is Long.

2627

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

2628

# This holds the sum of the aggregated values and is used in combination

2629

# with mean_count below to obtain the actual mean aggregate value.

2630

# The only possible value types are Long and Double.

2631

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

2632

# possible value type is a list of Values whose type can be Long, Double,

2633

# or String, according to the metric's type. All Values in the list must

2634

# be of the same type.

2635

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

2636

# service.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2637

},

2638

],

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2643

<code class="details" id="list">list(projectId, filter=None, pageSize=None, location=None, view=None, pageToken=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2644

<pre>List the jobs of a project.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2645

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2646

To list the jobs of a project in a region, we recommend using

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2647

`projects.locations.jobs.list` with a [regional endpoint]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2648

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2649

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2650

`projects.jobs.list` is not recommended, as you can only get the list of

2651

jobs that are running in `us-central1`.

2652

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2653

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2654

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2655

filter: string, The kind of filter to use.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2656

pageSize: integer, If there are many jobs, limit response to at most this many.

2657

The actual number of jobs returned will be the lesser of max_responses

2658

and an unspecified server-defined limit.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2659

location: string, The [regional endpoint]

2660

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2661

contains this job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2662

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2663

pageToken: string, Set this to the 'next_page_token' field of a previous response

2664

to request additional results in a long list.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2665

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2666

Allowed values

2667

1 - v1 error format

2668

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2669

2670

Returns:

2671

An object of the form:

2672

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2673

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2674

# be a partial response, depending on the page size in the ListJobsRequest.

2675

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2676

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2677

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2678

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2679

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2680

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2681

# A description of the user pipeline and stages through which it is executed.

2682

# Created by Cloud Dataflow service. Only retrieved with

2683

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2684

# form. This data is provided by the Dataflow service for ease of visualizing

2685

# the pipeline and interpreting Dataflow provided metrics.

2686

"displayData": [ # Pipeline level display data.

2687

{ # Data provided with a pipeline or transform to provide descriptive info.

2688

"url": "A String", # An optional full URL.

2689

"javaClassValue": "A String", # Contains value if the data is of java class type.

2690

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2691

"durationValue": "A String", # Contains value if the data is of duration type.

2692

"label": "A String", # An optional label to display in a dax UI for the element.

2693

"key": "A String", # The key identifying the display data.

2694

# This is intended to be used as a label for the display data

2695

# when viewed in a dax monitoring system.

2696

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2697

# language namespace (i.e. python module) which defines the display data.

2698

# This allows a dax monitoring system to specially handle the data

2699

# and perform custom rendering.

2700

"floatValue": 3.14, # Contains value if the data is of float type.

2701

"strValue": "A String", # Contains value if the data is of string type.

2702

"int64Value": "A String", # Contains value if the data is of int64 type.

2703

"boolValue": True or False, # Contains value if the data is of a boolean type.

2704

"shortStrValue": "A String", # A possible additional shorter value to display.

2705

# For example a java_class_name_value of com.mypackage.MyDoFn

2706

# will be stored with MyDoFn as the short_str_value and

2707

# com.mypackage.MyDoFn as the java_class_name value.

2708

# short_str_value can be displayed and java_class_name_value

2709

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2710

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2711

],

2712

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2713

{ # Description of the type, names/ids, and input/outputs for a transform.

2714

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2715

"A String",

2716

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2717

"displayData": [ # Transform-specific display data.

2718

{ # Data provided with a pipeline or transform to provide descriptive info.

2719

"url": "A String", # An optional full URL.

2720

"javaClassValue": "A String", # Contains value if the data is of java class type.

2721

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2722

"durationValue": "A String", # Contains value if the data is of duration type.

2723

"label": "A String", # An optional label to display in a dax UI for the element.

2724

"key": "A String", # The key identifying the display data.

2725

# This is intended to be used as a label for the display data

2726

# when viewed in a dax monitoring system.

2727

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2728

# language namespace (i.e. python module) which defines the display data.

2729

# This allows a dax monitoring system to specially handle the data

2730

# and perform custom rendering.

2731

"floatValue": 3.14, # Contains value if the data is of float type.

2732

"strValue": "A String", # Contains value if the data is of string type.

2733

"int64Value": "A String", # Contains value if the data is of int64 type.

2734

"boolValue": True or False, # Contains value if the data is of a boolean type.

2735

"shortStrValue": "A String", # A possible additional shorter value to display.

2736

# For example a java_class_name_value of com.mypackage.MyDoFn

2737

# will be stored with MyDoFn as the short_str_value and

2738

# com.mypackage.MyDoFn as the java_class_name value.

2739

# short_str_value can be displayed and java_class_name_value

2740

# will be displayed as a tooltip.

2741

},

2742

],

2743

"id": "A String", # SDK generated id of this transform instance.

2744

"inputCollectionName": [ # User names for all collection inputs to this transform.

2745

"A String",

2746

],

2747

"name": "A String", # User provided name for this transform instance.

2748

"kind": "A String", # Type of transform.

2749

},

2750

],

2751

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2752

{ # Description of the composing transforms, names/ids, and input/outputs of a

2753

# stage of execution. Some composing transforms and sources may have been

2754

# generated by the Dataflow service during execution planning.

2755

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2756

{ # Description of an interstitial value between transforms in an execution

2757

# stage.

2758

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2759

"name": "A String", # Dataflow service generated name for this source.

2760

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2761

# source is most closely associated.

2762

},

2763

],

2764

"inputSource": [ # Input sources for this stage.

2765

{ # Description of an input or output of an execution stage.

2766

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2767

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2768

# source is most closely associated.

2769

"sizeBytes": "A String", # Size of the source, if measurable.

2770

"name": "A String", # Dataflow service generated name for this source.

2771

},

2772

],

2773

"name": "A String", # Dataflow service generated name for this stage.

2774

"componentTransform": [ # Transforms that comprise this execution stage.

2775

{ # Description of a transform executed as part of an execution stage.

2776

"name": "A String", # Dataflow service generated name for this source.

2777

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2778

"originalTransform": "A String", # User name for the original user transform with which this transform is

2779

# most closely associated.

2780

},

2781

],

2782

"id": "A String", # Dataflow service generated id for this stage.

2783

"outputSource": [ # Output sources for this stage.

2784

{ # Description of an input or output of an execution stage.

2785

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2786

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2787

# source is most closely associated.

2788

"sizeBytes": "A String", # Size of the source, if measurable.

2789

"name": "A String", # Dataflow service generated name for this source.

2790

},

2791

],

2792

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

2797

#

2798

# The labels map can contain no more than 64 entries. Entries of the labels

2799

# map are UTF8 strings that comply with the following restrictions:

2800

#

2801

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2802

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2803

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2808

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

2809

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2810

"workerRegion": "A String", # The Compute Engine region

2811

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2812

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2813

# with worker_zone. If neither worker_region nor worker_zone is specified,

2814

# default to the control plane's region.

2815

"userAgent": { # A description of the process that generated the request.

2816

"a_key": "", # Properties of the object.

2817

},

2818

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2819

"version": { # A structure describing which components and their versions of the service

2820

# are required in order to run the job.

2821

"a_key": "", # Properties of the object.

2822

},

2823

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2824

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2825

#

2826

# Format:

2827

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2828

"experiments": [ # The list of experiments to enable.

2829

"A String",

2830

],

2831

"workerZone": "A String", # The Compute Engine zone

2832

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2833

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2834

# with worker_region. If neither worker_region nor worker_zone is specified,

2835

# a zone in the control plane's region is chosen based on available capacity.

2836

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2837

# specified in order for the job to have workers.

2838

{ # Describes one particular pool of Cloud Dataflow workers to be

2839

# instantiated by the Cloud Dataflow service in order to perform the

2840

# computations required by a job. Note that a workflow job may use

2841

# multiple pools, in order to match the various computational

2842

# requirements of the various stages of the job.

2843

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2844

# Compute Engine API.

2845

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2846

# only be set in the Fn API path. For non-cross-language pipelines this

2847

# should have only one entry. Cross-language pipelines will have two or more

2848

# entries.

2849

{ # Defines a SDK harness container for executing Dataflow pipelines.

2850

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2851

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2852

# container instance with this image. If false (or unset) recommends using

2853

# more than one core per SDK container instance with this image for

2854

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2859

# will attempt to choose a reasonable default.

2860

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2861

# are supported.

2862

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2863

"a_key": "A String",

2864

},

2865

"diskSourceImage": "A String", # Fully qualified source image for disks.

2866

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2867

{ # Describes the data disk used by a workflow job.

2868

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2869

# attempt to choose a reasonable default.

2870

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2871

# must be a disk type appropriate to the project and zone in which

2872

# the workers will run. If unknown or unspecified, the service

2873

# will attempt to choose a reasonable default.

2874

#

2875

# For example, the standard persistent disk type is a resource name

2876

# typically ending in "pd-standard". If SSD persistent disks are

2877

# available, the resource name typically ends with "pd-ssd". The

2878

# actual valid values are defined the Google Compute Engine API,

2879

# not by the Cloud Dataflow API; consult the Google Compute Engine

2880

# documentation for more information about determining the set of

2881

# available disk types for a particular project and zone.

2882

#

2883

# Google Compute Engine Disk types are local to a particular

2884

# project in a particular zone, and so the resource name will

2885

# typically look something like this:

2886

#

2887

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2888

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2889

},

2890

],

2891

"packages": [ # Packages to be installed on workers.

2892

{ # The packages that must be installed in order for a worker to run the

2893

# steps of the Cloud Dataflow job that will be assigned to its worker

2894

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2895

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2896

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2897

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2898

# might use this to install jars containing the user's code and all of the

2899

# various dependencies (libraries, data files, etc.) required in order

2900

# for that code to run.

2901

"name": "A String", # The name of the package.

2902

"location": "A String", # The resource to read the package from. The supported resource type is:

2903

#

2904

# Google Cloud Storage:

2905

#

2906

# storage.googleapis.com/{bucket}

2907

# bucket.storage.googleapis.com/

2908

},

2909

],

2910

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2911

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2912

# `TEARDOWN_NEVER`.

2913

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2914

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2915

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2916

# down.

2917

#

2918

# If the workers are not torn down by the service, they will

2919

# continue to run and use Google Compute Engine VM resources in the

2920

# user's project until they are explicitly terminated by the user.

2921

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2922

# policy except for small, manually supervised test jobs.

2923

#

2924

# If unknown or unspecified, the service will attempt to choose a reasonable

2925

# default.

2926

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2927

# the service will use the network "default".

2928

"ipConfiguration": "A String", # Configuration for VM IPs.

2929

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2930

# attempt to choose a reasonable default.

2931

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2932

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2933

"algorithm": "A String", # The algorithm to use for autoscaling.

2934

},

2935

"poolArgs": { # Extra arguments for this worker pool.

2936

"a_key": "", # Properties of the object. Contains field @type with type URL.

2937

},

2938

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2939

# the form "regions/REGION/subnetworks/SUBNETWORK".

2940

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2941

# execute the job. If zero or unspecified, the service will

2942

# attempt to choose a reasonable default.

2943

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2944

# service will choose a number of threads (according to the number of cores

2945

# on the selected machine type for batch, or 1 by convention for streaming).

2946

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2947

# harness, residing in Google Container Registry.

2948

#

2949

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2950

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2951

# using the standard Dataflow task runner. Users should ignore

2952

# this field.

2953

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2954

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2955

# access the Cloud Dataflow API.

2956

"A String",

2957

],

2958

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2959

#

2960

# When workers access Google Cloud APIs, they logically do so via

2961

# relative URLs. If this field is specified, it supplies the base

2962

# URL to use for resolving these relative URLs. The normative

2963

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2964

# Locators".

2965

#

2966

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2967

"workflowFileName": "A String", # The file to store the workflow in.

2968

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2969

# console.

2970

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2971

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2972

# taskrunner; e.g. "root".

2973

"vmId": "A String", # The ID string of the VM.

2974

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2975

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2976

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2977

# "shuffle/v1beta1".

2978

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2979

# storage.

2980

#

2981

# The supported resource type is:

2982

#

2983

# Google Cloud Storage:

2984

#

2985

# storage.googleapis.com/{bucket}/{object}

2986

# bucket.storage.googleapis.com/{object}

2987

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2988

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2989

# "dataflow/v1b3/projects".

2990

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2991

#

2992

# When workers access Google Cloud APIs, they logically do so via

2993

# relative URLs. If this field is specified, it supplies the base

2994

# URL to use for resolving these relative URLs. The normative

2995

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2996

# Locators".

2997

#

2998

# If not specified, the default value is "http://www.googleapis.com/"

2999

"workerId": "A String", # The ID of the worker running this pipeline.

3000

},

3001

"harnessCommand": "A String", # The command to launch the worker harness.

3002

"logDir": "A String", # The directory on the VM to store logs.

3003

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3004

"languageHint": "A String", # The suggested backend language.

3005

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3006

# taskrunner; e.g. "wheel".

3007

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3008

# will not be uploaded.

3009

#

3010

# The supported resource type is:

3011

#

3012

# Google Cloud Storage:

3013

# storage.googleapis.com/{bucket}/{object}

3014

# bucket.storage.googleapis.com/{object}

3015

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3016

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3017

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3018

# temporary storage.

3019

#

3020

# The supported resource type is:

3021

#

3022

# Google Cloud Storage:

3023

# storage.googleapis.com/{bucket}/{object}

3024

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3025

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3026

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

3027

# attempt to choose a reasonable default.

3028

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3029

# select a default set of packages which are useful to worker

3030

# harnesses written in a particular language.

3031

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3032

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3033

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3034

],

3035

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3036

# storage. The system will append the suffix "/temp-{JOBNAME} to

3037

# this resource prefix, where {JOBNAME} is the value of the

3038

# job_name field. The resulting bucket and object prefix is used

3039

# as the prefix of the resources used to store temporary data

3040

# needed during the job execution. NOTE: This will override the

3041

# value in taskrunner_settings.

3042

# The supported resource type is:

3043

#

3044

# Google Cloud Storage:

3045

#

3046

# storage.googleapis.com/{bucket}/{object}

3047

# bucket.storage.googleapis.com/{object}

3048

"internalExperiments": { # Experimental settings.

3049

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3050

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3051

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3052

# options are passed through the service and are used to recreate the

3053

# SDK pipeline options on the worker in a language agnostic and platform

3054

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3055

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3056

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3057

"dataset": "A String", # The dataset for the current project where various workflow

3058

# related tables are stored.

3059

#

3060

# The supported resource type is:

3061

#

3062

# Google BigQuery:

3063

# bigquery.googleapis.com/{dataset}

3064

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3065

# unspecified, the service will attempt to choose a reasonable

3066

# default. This should be in the form of the API service name,

3067

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3068

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3069

"stepsLocation": "A String", # The GCS location where the steps are stored.

3070

"steps": [ # Exactly one of step or steps_location should be specified.

3071

#

3072

# The top-level steps that constitute the entire job.

3073

{ # Defines a particular step within a Cloud Dataflow job.

3074

#

3075

# A job consists of multiple steps, each of which performs some

3076

# specific operation as part of the overall job. Data is typically

3077

# passed from one step to another as part of the job.

3078

#

3079

# Here's an example of a sequence of steps which together implement a

3080

# Map-Reduce job:

3081

#

3082

# * Read a collection of data from some source, parsing the

3083

# collection's elements.

3084

#

3085

# * Validate the elements.

3086

#

3087

# * Apply a user-defined function to map each element to some value

3088

# and extract an element-specific key value.

3089

#

3090

# * Group elements with the same key into a single element with

3091

# that key, transforming a multiply-keyed collection into a

3092

# uniquely-keyed collection.

3093

#

3094

# * Write the elements out to some data sink.

3095

#

3096

# Note that the Cloud Dataflow service may be used to run many different

3097

# types of jobs, not just Map-Reduce.

3098

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3099

"properties": { # Named properties associated with the step. Each kind of

3100

# predefined step has its own required set of properties.

3101

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

3102

"a_key": "", # Properties of the object.

3103

},

3104

"name": "A String", # The name that identifies the step. This must be unique for each

3105

# step with respect to all other steps in the Cloud Dataflow job.

3106

},

3107

],

3108

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3109

# callers cannot mutate it.

3110

{ # A message describing the state of a particular execution stage.

3111

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3112

"executionStageName": "A String", # The name of the execution stage.

3113

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3114

},

3115

],

3116

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3117

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3118

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3119

# by the metadata values provided here. Populated for ListJobs and all GetJob

3120

# views SUMMARY and higher.

3121

# ListJob response and Job SUMMARY view.

3122

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3123

"sdkSupportStatus": "A String", # The support status for this SDK version.

3124

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3125

"version": "A String", # The version of the SDK used to run the job.

3126

},

3127

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3128

{ # Metadata for a BigTable connector used by the job.

3129

"instanceId": "A String", # InstanceId accessed in the connection.

3130

"tableId": "A String", # TableId accessed in the connection.

3131

"projectId": "A String", # ProjectId accessed in the connection.

3132

},

3133

],

3134

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3135

{ # Metadata for a PubSub connector used by the job.

3136

"subscription": "A String", # Subscription used in the connection.

3137

"topic": "A String", # Topic accessed in the connection.

3138

},

3139

],

3140

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3141

{ # Metadata for a BigQuery connector used by the job.

3142

"dataset": "A String", # Dataset accessed in the connection.

3143

"projectId": "A String", # Project accessed in the connection.

3144

"query": "A String", # Query used to access data in the connection.

3145

"table": "A String", # Table accessed in the connection.

3146

},

3147

],

3148

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3149

{ # Metadata for a File connector used by the job.

3150

"filePattern": "A String", # File Pattern used to access files by the connector.

3151

},

3152

],

3153

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3154

{ # Metadata for a Datastore connector used by the job.

3155

"namespace": "A String", # Namespace used in the connection.

3156

"projectId": "A String", # ProjectId accessed in the connection.

3157

},

3158

],

3159

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3160

{ # Metadata for a Spanner connector used by the job.

3161

"instanceId": "A String", # InstanceId accessed in the connection.

3162

"databaseId": "A String", # DatabaseId accessed in the connection.

3163

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

3168

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3169

# contains this job.

3170

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

3171

# corresponding name prefixes of the new job.

3172

"a_key": "A String",

3173

},

3174

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3175

# Flexible resource scheduling jobs are started with some delay after job

3176

# creation, so start_time is unset before start and is updated when the

3177

# job is started by the Cloud Dataflow service. For other jobs, start_time

3178

# always equals to create_time and is immutable and set by the Cloud Dataflow

3179

# service.

3180

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3181

# If this field is set, the service will ensure its uniqueness.

3182

# The request to create a job will fail if the service has knowledge of a

3183

# previously submitted job with the same client's ID and job name.

3184

# The caller may use this field to ensure idempotence of job

3185

# creation across retried attempts to create a job.

3186

# By default, the field is empty and, in that case, the service ignores it.

3187

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3188

# isn't contained in the submitted job.

3189

"stages": { # A mapping from each stage to the information about that stage.

3190

"a_key": { # Contains information about how a particular

3191

# google.dataflow.v1beta3.Step will be executed.

3192

"stepName": [ # The steps associated with the execution stage.

3193

# Note that stages may have several steps, and that a given step

3194

# might be run by more than one stage.

3195

"A String",

3196

],

3197

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3198

},

3199

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3200

"type": "A String", # The type of Cloud Dataflow job.

3201

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3202

# Cloud Dataflow service.

3203

"tempFiles": [ # A set of files the system should be aware of that are used

3204

# for temporary storage. These temporary files will be

3205

# removed on job completion.

3206

# No duplicates are allowed.

3207

# No file patterns are supported.

3208

#

3209

# The supported files are:

3210

#

3211

# Google Cloud Storage:

3212

#

3213

# storage.googleapis.com/{bucket}/{object}

3214

# bucket.storage.googleapis.com/{object}

3215

"A String",

3216

],

3217

"id": "A String", # The unique ID of this job.

3218

#

3219

# This field is set by the Cloud Dataflow service when the Job is

3220

# created, and is immutable for the life of the job.

3221

"requestedState": "A String", # The job's requested state.

3222

#

3223

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3224

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3225

# also be used to directly set a job's requested state to

3226

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3227

# job if it has not already reached a terminal state.

3228

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3229

# of the job it replaced.

3230

#

3231

# When sending a `CreateJobRequest`, you can update a job by specifying it

3232

# here. The job named here is stopped, and its intermediate state is

3233

# transferred to this job.

3234

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3235

# snapshot.

3236

"currentState": "A String", # The current state of the job.

3237

#

3238

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3239

# specified.

3240

#

3241

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3242

# terminal state. After a job has reached a terminal state, no

3243

# further state updates may be made.

3244

#

3245

# This field may be mutated by the Cloud Dataflow service;

3246

# callers cannot mutate it.

3247

"name": "A String", # The user-specified Cloud Dataflow job name.

3248

#

3249

# Only one Job with a given name may exist in a project at any

3250

# given time. If a caller attempts to create a Job with the same

3251

# name as an already-existing Job, the attempt returns the

3252

# existing Job.

3253

#

3254

# The name must match the regular expression

3255

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3256

"currentStateTime": "A String", # The timestamp associated with the current state.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3257

},

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3258

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3259

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3260

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

3261

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3262

# failed to respond.

3263

{ # Indicates which [regional endpoint]

3264

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

3265

# to respond to a request for data.

3266

"name": "A String", # The name of the [regional endpoint]

3267

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3268

# failed to respond.

3269

},

3270

],

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

3276

<pre>Retrieves the next page of results.

3277

3278

Args:

3279

previous_request: The request for the previous page. (required)

3280

previous_response: The response from the request for the previous page. (required)

3281

3282

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3283

A request object that you can call 'execute()' on to request the next

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3284

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3289

<code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3290

<pre>Snapshot the state of a streaming job.

3291

3292

Args:

3293

projectId: string, The project which owns the job to be snapshotted. (required)

3294

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3295

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3296

The object takes the form of:

3297

3298

{ # Request to create a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3299

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3300

"location": "A String", # The location that contains this job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3301

"description": "A String", # User specified description of the snapshot. Maybe empty.

3302

"ttl": "A String", # TTL for the snapshot.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3303

}

3304

3305

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3312

3313

{ # Represents a snapshot of a job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3314

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

3315

"state": "A String", # State of the snapshot.

3316

"id": "A String", # The unique ID of this snapshot.

3317

"sourceJobId": "A String", # The job this snapshot was created from.

3318

"creationTime": "A String", # The time this snapshot was created.

3319

"description": "A String", # User specified description of the snapshot. Maybe empty.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3320

"pubsubMetadata": [ # PubSub snapshot metadata.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3321

{ # Represents a Pubsub snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3322

"snapshotName": "A String", # The name of the Pubsub snapshot.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3323

"expireTime": "A String", # The expire time of the Pubsub snapshot.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3324

"topicName": "A String", # The name of the Pubsub topic.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3325

},

3326

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3327

"projectId": "A String", # The project this snapshot belongs to.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3328

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

3329

# state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3334

<code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3335

<pre>Updates the state of an existing Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3336

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3337

To update the state of an existing job, we recommend using

3338

`projects.locations.jobs.update` with a [regional endpoint]

3339

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

3340

`projects.jobs.update` is not recommended, as you can only update the state

3341

of jobs that are running in `us-central1`.

3342

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3343

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3344

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

3345

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3346

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3347

The object takes the form of:

3348

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3349

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3350

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3351

# A description of the user pipeline and stages through which it is executed.

3352

# Created by Cloud Dataflow service. Only retrieved with

3353

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3354

# form. This data is provided by the Dataflow service for ease of visualizing

3355

# the pipeline and interpreting Dataflow provided metrics.

3356

"displayData": [ # Pipeline level display data.

3357

{ # Data provided with a pipeline or transform to provide descriptive info.

3358

"url": "A String", # An optional full URL.

3359

"javaClassValue": "A String", # Contains value if the data is of java class type.

3360

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3361

"durationValue": "A String", # Contains value if the data is of duration type.

3362

"label": "A String", # An optional label to display in a dax UI for the element.

3363

"key": "A String", # The key identifying the display data.

3364

# This is intended to be used as a label for the display data

3365

# when viewed in a dax monitoring system.

3366

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3367

# language namespace (i.e. python module) which defines the display data.

3368

# This allows a dax monitoring system to specially handle the data

3369

# and perform custom rendering.

3370

"floatValue": 3.14, # Contains value if the data is of float type.

3371

"strValue": "A String", # Contains value if the data is of string type.

3372

"int64Value": "A String", # Contains value if the data is of int64 type.

3373

"boolValue": True or False, # Contains value if the data is of a boolean type.

3374

"shortStrValue": "A String", # A possible additional shorter value to display.

3375

# For example a java_class_name_value of com.mypackage.MyDoFn

3376

# will be stored with MyDoFn as the short_str_value and

3377

# com.mypackage.MyDoFn as the java_class_name value.

3378

# short_str_value can be displayed and java_class_name_value

3379

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3380

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3381

],

3382

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3383

{ # Description of the type, names/ids, and input/outputs for a transform.

3384

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3385

"A String",

3386

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3387

"displayData": [ # Transform-specific display data.

3388

{ # Data provided with a pipeline or transform to provide descriptive info.

3389

"url": "A String", # An optional full URL.

3390

"javaClassValue": "A String", # Contains value if the data is of java class type.

3391

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3392

"durationValue": "A String", # Contains value if the data is of duration type.

3393

"label": "A String", # An optional label to display in a dax UI for the element.

3394

"key": "A String", # The key identifying the display data.

3395

# This is intended to be used as a label for the display data

3396

# when viewed in a dax monitoring system.

3397

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3398

# language namespace (i.e. python module) which defines the display data.

3399

# This allows a dax monitoring system to specially handle the data

3400

# and perform custom rendering.

3401

"floatValue": 3.14, # Contains value if the data is of float type.

3402

"strValue": "A String", # Contains value if the data is of string type.

3403

"int64Value": "A String", # Contains value if the data is of int64 type.

3404

"boolValue": True or False, # Contains value if the data is of a boolean type.

3405

"shortStrValue": "A String", # A possible additional shorter value to display.

3406

# For example a java_class_name_value of com.mypackage.MyDoFn

3407

# will be stored with MyDoFn as the short_str_value and

3408

# com.mypackage.MyDoFn as the java_class_name value.

3409

# short_str_value can be displayed and java_class_name_value

3410

# will be displayed as a tooltip.

3411

},

3412

],

3413

"id": "A String", # SDK generated id of this transform instance.

3414

"inputCollectionName": [ # User names for all collection inputs to this transform.

3415

"A String",

3416

],

3417

"name": "A String", # User provided name for this transform instance.

3418

"kind": "A String", # Type of transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3419

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3420

],

3421

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3422

{ # Description of the composing transforms, names/ids, and input/outputs of a

3423

# stage of execution. Some composing transforms and sources may have been

3424

# generated by the Dataflow service during execution planning.

3425

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3426

{ # Description of an interstitial value between transforms in an execution

3427

# stage.

3428

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3429

"name": "A String", # Dataflow service generated name for this source.

3430

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3431

# source is most closely associated.

3432

},

3433

],

3434

"inputSource": [ # Input sources for this stage.

3435

{ # Description of an input or output of an execution stage.

3436

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3437

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3438

# source is most closely associated.

3439

"sizeBytes": "A String", # Size of the source, if measurable.

3440

"name": "A String", # Dataflow service generated name for this source.

3441

},

3442

],

3443

"name": "A String", # Dataflow service generated name for this stage.

3444

"componentTransform": [ # Transforms that comprise this execution stage.

3445

{ # Description of a transform executed as part of an execution stage.

3446

"name": "A String", # Dataflow service generated name for this source.

3447

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3448

"originalTransform": "A String", # User name for the original user transform with which this transform is

3449

# most closely associated.

3450

},

3451

],

3452

"id": "A String", # Dataflow service generated id for this stage.

3453

"outputSource": [ # Output sources for this stage.

3454

{ # Description of an input or output of an execution stage.

3455

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3456

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3457

# source is most closely associated.

3458

"sizeBytes": "A String", # Size of the source, if measurable.

3459

"name": "A String", # Dataflow service generated name for this source.

3460

},

3461

],

3462

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3463

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3464

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3465

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3466

"labels": { # User-defined labels for this job.

3467

#

3468

# The labels map can contain no more than 64 entries. Entries of the labels

3469

# map are UTF8 strings that comply with the following restrictions:

3470

#

3471

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3472

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3473

# * Both keys and values are additionally constrained to be <= 128 bytes in

3474

# size.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3475

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3476

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3477

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3478

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3479

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3480

"workerRegion": "A String", # The Compute Engine region

3481

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3482

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3483

# with worker_zone. If neither worker_region nor worker_zone is specified,

3484

# default to the control plane's region.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3485

"userAgent": { # A description of the process that generated the request.

3486

"a_key": "", # Properties of the object.

3487

},

3488

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

3489

"version": { # A structure describing which components and their versions of the service

3490

# are required in order to run the job.

3491

"a_key": "", # Properties of the object.

3492

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3493

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3494

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3495

#

3496

# Format:

3497

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3498

"experiments": [ # The list of experiments to enable.

3499

"A String",

3500

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3501

"workerZone": "A String", # The Compute Engine zone

3502

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3503

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3504

# with worker_region. If neither worker_region nor worker_zone is specified,

3505

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3506

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

3507

# specified in order for the job to have workers.

3508

{ # Describes one particular pool of Cloud Dataflow workers to be

3509

# instantiated by the Cloud Dataflow service in order to perform the

3510

# computations required by a job. Note that a workflow job may use

3511

# multiple pools, in order to match the various computational

3512

# requirements of the various stages of the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3513

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3514

# Compute Engine API.

3515

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3516

# only be set in the Fn API path. For non-cross-language pipelines this

3517

# should have only one entry. Cross-language pipelines will have two or more

3518

# entries.

3519

{ # Defines a SDK harness container for executing Dataflow pipelines.

3520

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3521

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3522

# container instance with this image. If false (or unset) recommends using

3523

# more than one core per SDK container instance with this image for

3524

# efficiency. Note that Dataflow service may choose to override this property

3525

# if needed.

3526

},

3527

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3528

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

3529

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3530

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3531

# are supported.

3532

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3533

"a_key": "A String",

3534

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3535

"diskSourceImage": "A String", # Fully qualified source image for disks.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3536

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3537

{ # Describes the data disk used by a workflow job.

3538

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3539

# attempt to choose a reasonable default.

3540

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3541

# must be a disk type appropriate to the project and zone in which

3542

# the workers will run. If unknown or unspecified, the service

3543

# will attempt to choose a reasonable default.

3544

#

3545

# For example, the standard persistent disk type is a resource name

3546

# typically ending in "pd-standard". If SSD persistent disks are

3547

# available, the resource name typically ends with "pd-ssd". The

3548

# actual valid values are defined the Google Compute Engine API,

3549

# not by the Cloud Dataflow API; consult the Google Compute Engine

3550

# documentation for more information about determining the set of

3551

# available disk types for a particular project and zone.

3552

#

3553

# Google Compute Engine Disk types are local to a particular

3554

# project in a particular zone, and so the resource name will

3555

# typically look something like this:

3556

#

3557

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

3558

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3559

},

3560

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3561

"packages": [ # Packages to be installed on workers.

3562

{ # The packages that must be installed in order for a worker to run the

3563

# steps of the Cloud Dataflow job that will be assigned to its worker

3564

# pool.

3565

#

3566

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3567

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3568

# might use this to install jars containing the user's code and all of the

3569

# various dependencies (libraries, data files, etc.) required in order

3570

# for that code to run.

3571

"name": "A String", # The name of the package.

3572

"location": "A String", # The resource to read the package from. The supported resource type is:

3573

#

3574

# Google Cloud Storage:

3575

#

3576

# storage.googleapis.com/{bucket}

3577

# bucket.storage.googleapis.com/

3578

},

3579

],

3580

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3581

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3582

# `TEARDOWN_NEVER`.

3583

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3584

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3585

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3586

# down.

3587

#

3588

# If the workers are not torn down by the service, they will

3589

# continue to run and use Google Compute Engine VM resources in the

3590

# user's project until they are explicitly terminated by the user.

3591

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3592

# policy except for small, manually supervised test jobs.

3593

#

3594

# If unknown or unspecified, the service will attempt to choose a reasonable

3595

# default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3596

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3597

# the service will use the network "default".

3598

"ipConfiguration": "A String", # Configuration for VM IPs.

3599

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3600

# attempt to choose a reasonable default.

3601

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3602

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3603

"algorithm": "A String", # The algorithm to use for autoscaling.

3604

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3605

"poolArgs": { # Extra arguments for this worker pool.

3606

"a_key": "", # Properties of the object. Contains field @type with type URL.

3607

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3608

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3609

# the form "regions/REGION/subnetworks/SUBNETWORK".

3610

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3611

# execute the job. If zero or unspecified, the service will

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3612

# attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3613

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3614

# service will choose a number of threads (according to the number of cores

3615

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3616

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3617

# harness, residing in Google Container Registry.

3618

#

3619

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3620

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3621

# using the standard Dataflow task runner. Users should ignore

3622

# this field.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3623

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3624

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3625

# access the Cloud Dataflow API.

3626

"A String",

3627

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3628

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3629

#

3630

# When workers access Google Cloud APIs, they logically do so via

3631

# relative URLs. If this field is specified, it supplies the base

3632

# URL to use for resolving these relative URLs. The normative

3633

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3634

# Locators".

3635

#

3636

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3637

"workflowFileName": "A String", # The file to store the workflow in.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3638

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3639

# console.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3640

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3641

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3642

# taskrunner; e.g. "root".

3643

"vmId": "A String", # The ID string of the VM.

3644

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3645

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3646

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3647

# "shuffle/v1beta1".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3648

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3649

# storage.

3650

#

3651

# The supported resource type is:

3652

#

3653

# Google Cloud Storage:

3654

#

3655

# storage.googleapis.com/{bucket}/{object}

3656

# bucket.storage.googleapis.com/{object}

3657

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3658

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3659

# "dataflow/v1b3/projects".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3660

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3661

#

3662

# When workers access Google Cloud APIs, they logically do so via

3663

# relative URLs. If this field is specified, it supplies the base

3664

# URL to use for resolving these relative URLs. The normative

3665

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3666

# Locators".

3667

#

3668

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3669

"workerId": "A String", # The ID of the worker running this pipeline.

3670

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3671

"harnessCommand": "A String", # The command to launch the worker harness.

3672

"logDir": "A String", # The directory on the VM to store logs.

3673

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3674

"languageHint": "A String", # The suggested backend language.

3675

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3676

# taskrunner; e.g. "wheel".

3677

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3678

# will not be uploaded.

3679

#

3680

# The supported resource type is:

3681

#

3682

# Google Cloud Storage:

3683

# storage.googleapis.com/{bucket}/{object}

3684

# bucket.storage.googleapis.com/{object}

3685

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3686

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3687

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3688

# temporary storage.

3689

#

3690

# The supported resource type is:

3691

#

3692

# Google Cloud Storage:

3693

# storage.googleapis.com/{bucket}/{object}

3694

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3695

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3696

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

3697

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3698

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3699

# select a default set of packages which are useful to worker

3700

# harnesses written in a particular language.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3701

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3702

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3703

},

3704

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3705

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3706

# storage. The system will append the suffix "/temp-{JOBNAME} to

3707

# this resource prefix, where {JOBNAME} is the value of the

3708

# job_name field. The resulting bucket and object prefix is used

3709

# as the prefix of the resources used to store temporary data

3710

# needed during the job execution. NOTE: This will override the

3711

# value in taskrunner_settings.

3712

# The supported resource type is:

3713

#

3714

# Google Cloud Storage:

3715

#

3716

# storage.googleapis.com/{bucket}/{object}

3717

# bucket.storage.googleapis.com/{object}

3718

"internalExperiments": { # Experimental settings.

3719

"a_key": "", # Properties of the object. Contains field @type with type URL.

3720

},

3721

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3722

# options are passed through the service and are used to recreate the

3723

# SDK pipeline options on the worker in a language agnostic and platform

3724

# independent way.

3725

"a_key": "", # Properties of the object.

3726

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3727

"dataset": "A String", # The dataset for the current project where various workflow

3728

# related tables are stored.

3729

#

3730

# The supported resource type is:

3731

#

3732

# Google BigQuery:

3733

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3734

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3735

# unspecified, the service will attempt to choose a reasonable

3736

# default. This should be in the form of the API service name,

3737

# e.g. "compute.googleapis.com".

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

3738

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3739

"stepsLocation": "A String", # The GCS location where the steps are stored.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3740

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3741

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3742

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3743

{ # Defines a particular step within a Cloud Dataflow job.

3744

#

3745

# A job consists of multiple steps, each of which performs some

3746

# specific operation as part of the overall job. Data is typically

3747

# passed from one step to another as part of the job.

3748

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3749

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3750

# Map-Reduce job:

3751

#

3752

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3753

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3754

#

3755

# * Validate the elements.

3756

#

3757

# * Apply a user-defined function to map each element to some value

3758

# and extract an element-specific key value.

3759

#

3760

# * Group elements with the same key into a single element with

3761

# that key, transforming a multiply-keyed collection into a

3762

# uniquely-keyed collection.

3763

#

3764

# * Write the elements out to some data sink.

3765

#

3766

# Note that the Cloud Dataflow service may be used to run many different

3767

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3768

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3769

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3770

# predefined step has its own required set of properties.

3771

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3772

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

3773

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3774

"name": "A String", # The name that identifies the step. This must be unique for each

3775

# step with respect to all other steps in the Cloud Dataflow job.

3776

},

3777

],

3778

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3779

# callers cannot mutate it.

3780

{ # A message describing the state of a particular execution stage.

3781

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3782

"executionStageName": "A String", # The name of the execution stage.

3783

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

3784

},

3785

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3786

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3787

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3788

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3789

# by the metadata values provided here. Populated for ListJobs and all GetJob

3790

# views SUMMARY and higher.

3791

# ListJob response and Job SUMMARY view.

3792

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3793

"sdkSupportStatus": "A String", # The support status for this SDK version.

3794

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3795

"version": "A String", # The version of the SDK used to run the job.

3796

},

3797

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3798

{ # Metadata for a BigTable connector used by the job.

3799

"instanceId": "A String", # InstanceId accessed in the connection.

3800

"tableId": "A String", # TableId accessed in the connection.

3801

"projectId": "A String", # ProjectId accessed in the connection.

3802

},

3803

],

3804

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3805

{ # Metadata for a PubSub connector used by the job.

3806

"subscription": "A String", # Subscription used in the connection.

3807

"topic": "A String", # Topic accessed in the connection.

3808

},

3809

],

3810

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3811

{ # Metadata for a BigQuery connector used by the job.

3812

"dataset": "A String", # Dataset accessed in the connection.

3813

"projectId": "A String", # Project accessed in the connection.

3814

"query": "A String", # Query used to access data in the connection.

3815

"table": "A String", # Table accessed in the connection.

3816

},

3817

],

3818

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3819

{ # Metadata for a File connector used by the job.

3820

"filePattern": "A String", # File Pattern used to access files by the connector.

3821

},

3822

],

3823

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3824

{ # Metadata for a Datastore connector used by the job.

3825

"namespace": "A String", # Namespace used in the connection.

3826

"projectId": "A String", # ProjectId accessed in the connection.

3827

},

3828

],

3829

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3830

{ # Metadata for a Spanner connector used by the job.

3831

"instanceId": "A String", # InstanceId accessed in the connection.

3832

"databaseId": "A String", # DatabaseId accessed in the connection.

3833

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

3838

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3839

# contains this job.

3840

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

3841

# corresponding name prefixes of the new job.

3842

"a_key": "A String",

3843

},

3844

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3845

# Flexible resource scheduling jobs are started with some delay after job

3846

# creation, so start_time is unset before start and is updated when the

3847

# job is started by the Cloud Dataflow service. For other jobs, start_time

3848

# always equals to create_time and is immutable and set by the Cloud Dataflow

3849

# service.

3850

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3851

# If this field is set, the service will ensure its uniqueness.

3852

# The request to create a job will fail if the service has knowledge of a

3853

# previously submitted job with the same client's ID and job name.

3854

# The caller may use this field to ensure idempotence of job

3855

# creation across retried attempts to create a job.

3856

# By default, the field is empty and, in that case, the service ignores it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3857

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3858

# isn't contained in the submitted job.

3859

"stages": { # A mapping from each stage to the information about that stage.

3860

"a_key": { # Contains information about how a particular

3861

# google.dataflow.v1beta3.Step will be executed.

3862

"stepName": [ # The steps associated with the execution stage.

3863

# Note that stages may have several steps, and that a given step

3864

# might be run by more than one stage.

"A String",

],

},

},

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3870

"type": "A String", # The type of Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3871

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3872

# Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3873

"tempFiles": [ # A set of files the system should be aware of that are used

3874

# for temporary storage. These temporary files will be

3875

# removed on job completion.

3876

# No duplicates are allowed.

3877

# No file patterns are supported.

3878

#

3879

# The supported files are:

3880

#

3881

# Google Cloud Storage:

3882

#

3883

# storage.googleapis.com/{bucket}/{object}

3884

# bucket.storage.googleapis.com/{object}

3885

"A String",

3886

],

3887

"id": "A String", # The unique ID of this job.

3888

#

3889

# This field is set by the Cloud Dataflow service when the Job is

3890

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3891

"requestedState": "A String", # The job's requested state.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3892

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3893

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3894

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3895

# also be used to directly set a job's requested state to

3896

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3897

# job if it has not already reached a terminal state.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3898

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3899

# of the job it replaced.

3900

#

3901

# When sending a `CreateJobRequest`, you can update a job by specifying it

3902

# here. The job named here is stopped, and its intermediate state is

3903

# transferred to this job.

3904

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3905

# snapshot.

3906

"currentState": "A String", # The current state of the job.

3907

#

3908

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3909

# specified.

3910

#

3911

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3912

# terminal state. After a job has reached a terminal state, no

3913

# further state updates may be made.

3914

#

3915

# This field may be mutated by the Cloud Dataflow service;

3916

# callers cannot mutate it.

3917

"name": "A String", # The user-specified Cloud Dataflow job name.

3918

#

3919

# Only one Job with a given name may exist in a project at any

3920

# given time. If a caller attempts to create a Job with the same

3921

# name as an already-existing Job, the attempt returns the

3922

# existing Job.

3923

#

3924

# The name must match the regular expression

3925

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3926

"currentStateTime": "A String", # The timestamp associated with the current state.

3927

}

3928

3929

location: string, The [regional endpoint]

3930

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3931

contains this job.

3932

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3939

3940

{ # Defines a job to be run by the Cloud Dataflow service.

3941

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3942

# A description of the user pipeline and stages through which it is executed.

3943

# Created by Cloud Dataflow service. Only retrieved with

3944

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3945

# form. This data is provided by the Dataflow service for ease of visualizing

3946

# the pipeline and interpreting Dataflow provided metrics.

3947

"displayData": [ # Pipeline level display data.

3948

{ # Data provided with a pipeline or transform to provide descriptive info.

3949

"url": "A String", # An optional full URL.

3950

"javaClassValue": "A String", # Contains value if the data is of java class type.

3951

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3952

"durationValue": "A String", # Contains value if the data is of duration type.

3953

"label": "A String", # An optional label to display in a dax UI for the element.

3954

"key": "A String", # The key identifying the display data.

3955

# This is intended to be used as a label for the display data

3956

# when viewed in a dax monitoring system.

3957

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3958

# language namespace (i.e. python module) which defines the display data.

3959

# This allows a dax monitoring system to specially handle the data

3960

# and perform custom rendering.

3961

"floatValue": 3.14, # Contains value if the data is of float type.

3962

"strValue": "A String", # Contains value if the data is of string type.

3963

"int64Value": "A String", # Contains value if the data is of int64 type.

3964

"boolValue": True or False, # Contains value if the data is of a boolean type.

3965

"shortStrValue": "A String", # A possible additional shorter value to display.

3966

# For example a java_class_name_value of com.mypackage.MyDoFn

3967

# will be stored with MyDoFn as the short_str_value and

3968

# com.mypackage.MyDoFn as the java_class_name value.

3969

# short_str_value can be displayed and java_class_name_value

3970

# will be displayed as a tooltip.

3971

},

3972

],

3973

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3974

{ # Description of the type, names/ids, and input/outputs for a transform.

3975

"outputCollectionName": [ # User names for all collection outputs to this transform.

3976

"A String",

3977

],

3978

"displayData": [ # Transform-specific display data.

3979

{ # Data provided with a pipeline or transform to provide descriptive info.

3980

"url": "A String", # An optional full URL.

3981

"javaClassValue": "A String", # Contains value if the data is of java class type.

3982

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3983

"durationValue": "A String", # Contains value if the data is of duration type.

3984

"label": "A String", # An optional label to display in a dax UI for the element.

3985

"key": "A String", # The key identifying the display data.

3986

# This is intended to be used as a label for the display data

3987

# when viewed in a dax monitoring system.

3988

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3989

# language namespace (i.e. python module) which defines the display data.

3990

# This allows a dax monitoring system to specially handle the data

3991

# and perform custom rendering.

3992

"floatValue": 3.14, # Contains value if the data is of float type.

3993

"strValue": "A String", # Contains value if the data is of string type.

3994

"int64Value": "A String", # Contains value if the data is of int64 type.

3995

"boolValue": True or False, # Contains value if the data is of a boolean type.

3996

"shortStrValue": "A String", # A possible additional shorter value to display.

3997

# For example a java_class_name_value of com.mypackage.MyDoFn

3998

# will be stored with MyDoFn as the short_str_value and

3999

# com.mypackage.MyDoFn as the java_class_name value.

4000

# short_str_value can be displayed and java_class_name_value

4001

# will be displayed as a tooltip.

4002

},

4003

],

4004

"id": "A String", # SDK generated id of this transform instance.

4005

"inputCollectionName": [ # User names for all collection inputs to this transform.

4006

"A String",

4007

],

4008

"name": "A String", # User provided name for this transform instance.

4009

"kind": "A String", # Type of transform.

4010

},

4011

],

4012

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

4013

{ # Description of the composing transforms, names/ids, and input/outputs of a

4014

# stage of execution. Some composing transforms and sources may have been

4015

# generated by the Dataflow service during execution planning.

4016

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

4017

{ # Description of an interstitial value between transforms in an execution

4018

# stage.

4019

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4020

"name": "A String", # Dataflow service generated name for this source.

4021

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4022

# source is most closely associated.

4023

},

4024

],

4025

"inputSource": [ # Input sources for this stage.

4026

{ # Description of an input or output of an execution stage.

4027

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4028

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4029

# source is most closely associated.

4030

"sizeBytes": "A String", # Size of the source, if measurable.

4031

"name": "A String", # Dataflow service generated name for this source.

4032

},

4033

],

4034

"name": "A String", # Dataflow service generated name for this stage.

4035

"componentTransform": [ # Transforms that comprise this execution stage.

4036

{ # Description of a transform executed as part of an execution stage.

4037

"name": "A String", # Dataflow service generated name for this source.

4038

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4039

"originalTransform": "A String", # User name for the original user transform with which this transform is

4040

# most closely associated.

4041

},

4042

],

4043

"id": "A String", # Dataflow service generated id for this stage.

4044

"outputSource": [ # Output sources for this stage.

4045

{ # Description of an input or output of an execution stage.

4046

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4047

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4048

# source is most closely associated.

4049

"sizeBytes": "A String", # Size of the source, if measurable.

4050

"name": "A String", # Dataflow service generated name for this source.

4051

},

4052

],

4053

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

4058

#

4059

# The labels map can contain no more than 64 entries. Entries of the labels

4060

# map are UTF8 strings that comply with the following restrictions:

4061

#

4062

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

4063

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

4064

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

4069

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

4070

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

4071

"workerRegion": "A String", # The Compute Engine region

4072

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

4073

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

4074

# with worker_zone. If neither worker_region nor worker_zone is specified,

4075

# default to the control plane's region.

4076

"userAgent": { # A description of the process that generated the request.

4077

"a_key": "", # Properties of the object.

4078

},

4079

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

4080

"version": { # A structure describing which components and their versions of the service

4081

# are required in order to run the job.

4082

"a_key": "", # Properties of the object.

4083

},

4084

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

4085

# at rest, AKA a Customer Managed Encryption Key (CMEK).

4086

#

4087

# Format:

4088

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

4089

"experiments": [ # The list of experiments to enable.

4090

"A String",

4091

],

4092

"workerZone": "A String", # The Compute Engine zone

4093

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

4094

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

4095

# with worker_region. If neither worker_region nor worker_zone is specified,

4096

# a zone in the control plane's region is chosen based on available capacity.

4097

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

4098

# specified in order for the job to have workers.

4099

{ # Describes one particular pool of Cloud Dataflow workers to be

4100

# instantiated by the Cloud Dataflow service in order to perform the

4101

# computations required by a job. Note that a workflow job may use

4102

# multiple pools, in order to match the various computational

4103

# requirements of the various stages of the job.

4104

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

4105

# Compute Engine API.

4106

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

4107

# only be set in the Fn API path. For non-cross-language pipelines this

4108

# should have only one entry. Cross-language pipelines will have two or more

4109

# entries.

4110

{ # Defines a SDK harness container for executing Dataflow pipelines.

4111

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

4112

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

4113

# container instance with this image. If false (or unset) recommends using

4114

# more than one core per SDK container instance with this image for

4115

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

4120

# will attempt to choose a reasonable default.

4121

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

4122

# are supported.

4123

"metadata": { # Metadata to set on the Google Compute Engine VMs.

4124

"a_key": "A String",

4125

},

4126

"diskSourceImage": "A String", # Fully qualified source image for disks.

4127

"dataDisks": [ # Data disks that are used by a VM in this workflow.

4128

{ # Describes the data disk used by a workflow job.

4129

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

4130

# attempt to choose a reasonable default.

4131

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

4132

# must be a disk type appropriate to the project and zone in which

4133

# the workers will run. If unknown or unspecified, the service

4134

# will attempt to choose a reasonable default.

4135

#

4136

# For example, the standard persistent disk type is a resource name

4137

# typically ending in "pd-standard". If SSD persistent disks are

4138

# available, the resource name typically ends with "pd-ssd". The

4139

# actual valid values are defined the Google Compute Engine API,

4140

# not by the Cloud Dataflow API; consult the Google Compute Engine

4141

# documentation for more information about determining the set of

4142

# available disk types for a particular project and zone.

4143

#

4144

# Google Compute Engine Disk types are local to a particular

4145

# project in a particular zone, and so the resource name will

4146

# typically look something like this:

4147

#

4148

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

4149

"mountPoint": "A String", # Directory in a VM where disk is mounted.

4150

},

4151

],

4152

"packages": [ # Packages to be installed on workers.

4153

{ # The packages that must be installed in order for a worker to run the

4154

# steps of the Cloud Dataflow job that will be assigned to its worker

4155

# pool.

4156

#

4157

# This is the mechanism by which the Cloud Dataflow SDK causes code to

4158

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

4159

# might use this to install jars containing the user's code and all of the

4160

# various dependencies (libraries, data files, etc.) required in order

4161

# for that code to run.

4162

"name": "A String", # The name of the package.

4163

"location": "A String", # The resource to read the package from. The supported resource type is:

4164

#

4165

# Google Cloud Storage:

4166

#

4167

# storage.googleapis.com/{bucket}

4168

# bucket.storage.googleapis.com/

4169

},

4170

],

4171

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

4172

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

4173

# `TEARDOWN_NEVER`.

4174

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

4175

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

4176

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

4177

# down.

4178

#

4179

# If the workers are not torn down by the service, they will

4180

# continue to run and use Google Compute Engine VM resources in the

4181

# user's project until they are explicitly terminated by the user.

4182

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

4183

# policy except for small, manually supervised test jobs.

4184

#

4185

# If unknown or unspecified, the service will attempt to choose a reasonable

4186

# default.

4187

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

4188

# the service will use the network "default".

4189

"ipConfiguration": "A String", # Configuration for VM IPs.

4190

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

4191

# attempt to choose a reasonable default.

4192

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

4193

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

4194

"algorithm": "A String", # The algorithm to use for autoscaling.

4195

},

4196

"poolArgs": { # Extra arguments for this worker pool.

4197

"a_key": "", # Properties of the object. Contains field @type with type URL.

4198

},

4199

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

4200

# the form "regions/REGION/subnetworks/SUBNETWORK".

4201

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

4202

# execute the job. If zero or unspecified, the service will

4203

# attempt to choose a reasonable default.

4204

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

4205

# service will choose a number of threads (according to the number of cores

4206

# on the selected machine type for batch, or 1 by convention for streaming).

4207

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

4208

# harness, residing in Google Container Registry.

4209

#

4210

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

4211

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

4212

# using the standard Dataflow task runner. Users should ignore

4213

# this field.

4214

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

4215

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

4216

# access the Cloud Dataflow API.

4217

"A String",

4218

],

4219

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

4220

#

4221

# When workers access Google Cloud APIs, they logically do so via

4222

# relative URLs. If this field is specified, it supplies the base

4223

# URL to use for resolving these relative URLs. The normative

4224

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4225

# Locators".

4226

#

4227

# If not specified, the default value is "http://www.googleapis.com/"

4228

"workflowFileName": "A String", # The file to store the workflow in.

4229

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

4230

# console.

4231

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

4232

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

4233

# taskrunner; e.g. "root".

4234

"vmId": "A String", # The ID string of the VM.

4235

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

4236

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

4237

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

4238

# "shuffle/v1beta1".

4239

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4240

# storage.

4241

#

4242

# The supported resource type is:

4243

#

4244

# Google Cloud Storage:

4245

#

4246

# storage.googleapis.com/{bucket}/{object}

4247

# bucket.storage.googleapis.com/{object}

4248

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

4249

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

4250

# "dataflow/v1b3/projects".

4251

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

4252

#

4253

# When workers access Google Cloud APIs, they logically do so via

4254

# relative URLs. If this field is specified, it supplies the base

4255

# URL to use for resolving these relative URLs. The normative

4256

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4257

# Locators".

4258

#

4259

# If not specified, the default value is "http://www.googleapis.com/"

4260

"workerId": "A String", # The ID of the worker running this pipeline.

4261

},

4262

"harnessCommand": "A String", # The command to launch the worker harness.

4263

"logDir": "A String", # The directory on the VM to store logs.

4264

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

4265

"languageHint": "A String", # The suggested backend language.

4266

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

4267

# taskrunner; e.g. "wheel".

4268

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

4269

# will not be uploaded.

4270

#

4271

# The supported resource type is:

4272

#

4273

# Google Cloud Storage:

4274

# storage.googleapis.com/{bucket}/{object}

4275

# bucket.storage.googleapis.com/{object}

4276

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

4277

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

4278

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

4279

# temporary storage.

4280

#

4281

# The supported resource type is:

4282

#

4283

# Google Cloud Storage:

4284

# storage.googleapis.com/{bucket}/{object}

4285

# bucket.storage.googleapis.com/{object}

4286

},

4287

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

4288

# attempt to choose a reasonable default.

4289

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

4290

# select a default set of packages which are useful to worker

4291

# harnesses written in a particular language.

4292

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

4293

# service will attempt to choose a reasonable default.

4294

},

4295

],

4296

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4297

# storage. The system will append the suffix "/temp-{JOBNAME} to

4298

# this resource prefix, where {JOBNAME} is the value of the

4299

# job_name field. The resulting bucket and object prefix is used

4300

# as the prefix of the resources used to store temporary data

4301

# needed during the job execution. NOTE: This will override the

4302

# value in taskrunner_settings.

4303

# The supported resource type is:

4304

#

4305

# Google Cloud Storage:

4306

#

4307

# storage.googleapis.com/{bucket}/{object}

4308

# bucket.storage.googleapis.com/{object}

4309

"internalExperiments": { # Experimental settings.

4310

"a_key": "", # Properties of the object. Contains field @type with type URL.

4311

},

4312

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

4313

# options are passed through the service and are used to recreate the

4314

# SDK pipeline options on the worker in a language agnostic and platform

4315

# independent way.

4316

"a_key": "", # Properties of the object.

4317

},

4318

"dataset": "A String", # The dataset for the current project where various workflow

4319

# related tables are stored.

4320

#

4321

# The supported resource type is:

4322

#

4323

# Google BigQuery:

4324

# bigquery.googleapis.com/{dataset}

4325

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

4326

# unspecified, the service will attempt to choose a reasonable

4327

# default. This should be in the form of the API service name,

4328

# e.g. "compute.googleapis.com".

4329

},

4330

"stepsLocation": "A String", # The GCS location where the steps are stored.

4331

"steps": [ # Exactly one of step or steps_location should be specified.

4332

#

4333

# The top-level steps that constitute the entire job.

4334

{ # Defines a particular step within a Cloud Dataflow job.

4335

#

4336

# A job consists of multiple steps, each of which performs some

4337

# specific operation as part of the overall job. Data is typically

4338

# passed from one step to another as part of the job.

4339

#

4340

# Here's an example of a sequence of steps which together implement a

4341

# Map-Reduce job:

4342

#

4343

# * Read a collection of data from some source, parsing the

4344

# collection's elements.

4345

#

4346

# * Validate the elements.

4347

#

4348

# * Apply a user-defined function to map each element to some value

4349

# and extract an element-specific key value.

4350

#

4351

# * Group elements with the same key into a single element with

4352

# that key, transforming a multiply-keyed collection into a

4353

# uniquely-keyed collection.

4354

#

4355

# * Write the elements out to some data sink.

4356

#

4357

# Note that the Cloud Dataflow service may be used to run many different

4358

# types of jobs, not just Map-Reduce.

4359

"kind": "A String", # The kind of step in the Cloud Dataflow job.

4360

"properties": { # Named properties associated with the step. Each kind of

4361

# predefined step has its own required set of properties.

4362

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

4363

"a_key": "", # Properties of the object.

4364

},

4365

"name": "A String", # The name that identifies the step. This must be unique for each

4366

# step with respect to all other steps in the Cloud Dataflow job.

4367

},

4368

],

4369

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

4370

# callers cannot mutate it.

4371

{ # A message describing the state of a particular execution stage.

4372

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

4373

"executionStageName": "A String", # The name of the execution stage.

4374

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

4375

},

4376

],

4377

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

4378

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

4379

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

4380

# by the metadata values provided here. Populated for ListJobs and all GetJob

4381

# views SUMMARY and higher.

4382

# ListJob response and Job SUMMARY view.

4383

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

4384

"sdkSupportStatus": "A String", # The support status for this SDK version.

4385

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

4386

"version": "A String", # The version of the SDK used to run the job.

4387

},

4388

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

4389

{ # Metadata for a BigTable connector used by the job.

4390

"instanceId": "A String", # InstanceId accessed in the connection.

4391

"tableId": "A String", # TableId accessed in the connection.

4392

"projectId": "A String", # ProjectId accessed in the connection.

4393

},

4394

],

4395

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

4396

{ # Metadata for a PubSub connector used by the job.

4397

"subscription": "A String", # Subscription used in the connection.

4398

"topic": "A String", # Topic accessed in the connection.

4399

},

4400

],

4401

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

4402

{ # Metadata for a BigQuery connector used by the job.

4403

"dataset": "A String", # Dataset accessed in the connection.

4404

"projectId": "A String", # Project accessed in the connection.

4405

"query": "A String", # Query used to access data in the connection.

4406

"table": "A String", # Table accessed in the connection.

4407

},

4408

],

4409

"fileDetails": [ # Identification of a File source used in the Dataflow job.

4410

{ # Metadata for a File connector used by the job.

4411

"filePattern": "A String", # File Pattern used to access files by the connector.

4412

},

4413

],

4414

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

4415

{ # Metadata for a Datastore connector used by the job.

4416

"namespace": "A String", # Namespace used in the connection.

4417

"projectId": "A String", # ProjectId accessed in the connection.

4418

},

4419

],

4420

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

4421

{ # Metadata for a Spanner connector used by the job.

4422

"instanceId": "A String", # InstanceId accessed in the connection.

4423

"databaseId": "A String", # DatabaseId accessed in the connection.

4424

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

4429

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

4430

# contains this job.

4431

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

4432

# corresponding name prefixes of the new job.

4433

"a_key": "A String",

4434

},

4435

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

4436

# Flexible resource scheduling jobs are started with some delay after job

4437

# creation, so start_time is unset before start and is updated when the

4438

# job is started by the Cloud Dataflow service. For other jobs, start_time

4439

# always equals to create_time and is immutable and set by the Cloud Dataflow

4440

# service.

4441

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

4442

# If this field is set, the service will ensure its uniqueness.

4443

# The request to create a job will fail if the service has knowledge of a

4444

# previously submitted job with the same client's ID and job name.

4445

# The caller may use this field to ensure idempotence of job

4446

# creation across retried attempts to create a job.

4447

# By default, the field is empty and, in that case, the service ignores it.

4448

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

4449

# isn't contained in the submitted job.

4450

"stages": { # A mapping from each stage to the information about that stage.

4451

"a_key": { # Contains information about how a particular

4452

# google.dataflow.v1beta3.Step will be executed.

4453

"stepName": [ # The steps associated with the execution stage.

4454

# Note that stages may have several steps, and that a given step

4455

# might be run by more than one stage.

"A String",

],

},

},

},

"type": "A String", # The type of Cloud Dataflow job.

4462

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

4463

# Cloud Dataflow service.

4464

"tempFiles": [ # A set of files the system should be aware of that are used

4465

# for temporary storage. These temporary files will be

4466

# removed on job completion.

4467

# No duplicates are allowed.

4468

# No file patterns are supported.

4469

#

4470

# The supported files are:

4471

#

4472

# Google Cloud Storage:

4473

#

4474

# storage.googleapis.com/{bucket}/{object}

4475

# bucket.storage.googleapis.com/{object}

4476

"A String",

4477

],

4478

"id": "A String", # The unique ID of this job.

4479

#

4480

# This field is set by the Cloud Dataflow service when the Job is

4481

# created, and is immutable for the life of the job.

4482

"requestedState": "A String", # The job's requested state.

4483

#

4484

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

4485

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

4486

# also be used to directly set a job's requested state to

4487

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

4488

# job if it has not already reached a terminal state.

4489

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

4490

# of the job it replaced.

4491

#

4492

# When sending a `CreateJobRequest`, you can update a job by specifying it

4493

# here. The job named here is stopped, and its intermediate state is

4494

# transferred to this job.

4495

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

4496

# snapshot.

4497

"currentState": "A String", # The current state of the job.

4498

#

4499

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

4500

# specified.

4501

#

4502

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

4503

# terminal state. After a job has reached a terminal state, no

4504

# further state updates may be made.

4505

#

4506

# This field may be mutated by the Cloud Dataflow service;

4507

# callers cannot mutate it.

4508

"name": "A String", # The user-specified Cloud Dataflow job name.

4509

#

4510

# Only one Job with a given name may exist in a project at any

4511

# given time. If a caller attempts to create a Job with the same

4512

# name as an already-existing Job, the attempt returns the

4513

# existing Job.

4514

#

4515

# The name must match the regular expression

4516

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

4517

"currentStateTime": "A String", # The timestamp associated with the current state.

4518

}</pre>

Nathaniel Manista