Blame - docs/dyn/dataflow_v1b3.projects.locations.templates.html - platform/external/python/google-api-python-client

2017-06-06 18:46:08 -0400

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

78

<code><a href="#create">create(projectId, location, body=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

79

<p class="firstline">Creates a Cloud Dataflow job from a template.</p>

80

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

81

<code><a href="#get">get(projectId, location, view=None, gcsPath=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

82

<p class="firstline">Get the template associated with a template.</p>

83

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

84

<code><a href="#launch">launch(projectId, location, body=None, validateOnly=None, gcsPath=None, dynamicTemplate_gcsPath=None, dynamicTemplate_stagingLocation=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

85

<p class="firstline">Launch a template.</p>

86

<h3>Method Details</h3>

87

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

88

<code class="details" id="create">create(projectId, location, body=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

89

<pre>Creates a Cloud Dataflow job from a template.

90

91

Args:

92

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

93

location: string, The [regional endpoint]

94

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

95

which to direct the request. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

96

body: object, The request body.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

97

The object takes the form of:

98

99

{ # A request to create a Cloud Dataflow job from a template.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

100

"jobName": "A String", # Required. The job name to use for the created job.

101

"gcsPath": "A String", # Required. A Cloud Storage path to the template from which to

102

# create the job.

103

# Must be a valid Cloud Storage URL, beginning with `gs://`.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

104

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

105

"machineType": "A String", # The machine type to use for the job. Defaults to the value from the

106

# template if not specified.

107

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

108

# the form "regions/REGION/subnetworks/SUBNETWORK".

109

"ipConfiguration": "A String", # Configuration for VM IPs.

110

"kmsKeyName": "A String", # Optional. Name for the Cloud KMS key for the job.

111

# Key format is:

112

# projects/<project>/locations/<location>/keyRings/<keyring>/cryptoKeys/<key>

113

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

114

# Must be a valid Cloud Storage URL, beginning with `gs://`.

115

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

116

# Use with caution.

117

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

118

# the service will use the network "default".

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

119

"workerRegion": "A String", # The Compute Engine region

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

120

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

121

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

122

# with worker_zone. If neither worker_region nor worker_zone is specified,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

123

# default to the control plane's region.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

124

"zone": "A String", # The Compute Engine [availability

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

125

# zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

126

# for launching worker instances to run your pipeline.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

127

# In the future, worker_zone will take precedence.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

128

"numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

129

"workerZone": "A String", # The Compute Engine zone

130

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

131

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

132

# with worker_region. If neither worker_region nor worker_zone is specified,

133

# a zone in the control plane's region is chosen based on available capacity.

134

# If both `worker_zone` and `zone` are set, `worker_zone` takes precedence.

135

"additionalUserLabels": { # Additional user labels to be specified for the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

136

# Keys and values should follow the restrictions specified in the [labeling

137

# restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)

138

# page.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

139

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

140

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

141

"additionalExperiments": [ # Additional experiment flags for the job.

142

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

143

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

144

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

145

# available to your pipeline during execution, from 1 to 1000.

146

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

147

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

148

"location": "A String", # The [regional endpoint]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

149

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

150

# which to direct the request.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

151

"parameters": { # The runtime parameters to pass to the job.

152

"a_key": "A String",

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

153

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

154

}

155

156

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

163

164

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

165

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

166

# If this field is set, the service will ensure its uniqueness.

167

# The request to create a job will fail if the service has knowledge of a

168

# previously submitted job with the same client's ID and job name.

169

# The caller may use this field to ensure idempotence of job

170

# creation across retried attempts to create a job.

171

# By default, the field is empty and, in that case, the service ignores it.

172

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

173

#

174

# This field is set by the Cloud Dataflow service when the Job is

175

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

176

"currentStateTime": "A String", # The timestamp associated with the current state.

177

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

178

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

179

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

180

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

181

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

182

"internalExperiments": { # Experimental settings.

183

"a_key": "", # Properties of the object. Contains field @type with type URL.

184

},

185

"workerRegion": "A String", # The Compute Engine region

186

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

187

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

188

# with worker_zone. If neither worker_region nor worker_zone is specified,

189

# default to the control plane's region.

190

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

191

# at rest, AKA a Customer Managed Encryption Key (CMEK).

192

#

193

# Format:

194

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

195

"userAgent": { # A description of the process that generated the request.

196

"a_key": "", # Properties of the object.

197

},

198

"workerZone": "A String", # The Compute Engine zone

199

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

200

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

201

# with worker_region. If neither worker_region nor worker_zone is specified,

202

# a zone in the control plane's region is chosen based on available capacity.

203

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

204

# unspecified, the service will attempt to choose a reasonable

205

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

206

# e.g. "compute.googleapis.com".

207

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

208

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

209

# this resource prefix, where {JOBNAME} is the value of the

210

# job_name field. The resulting bucket and object prefix is used

211

# as the prefix of the resources used to store temporary data

212

# needed during the job execution. NOTE: This will override the

213

# value in taskrunner_settings.

214

# The supported resource type is:

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

215

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

216

# Google Cloud Storage:

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

217

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

218

# storage.googleapis.com/{bucket}/{object}

219

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

220

"experiments": [ # The list of experiments to enable.

221

"A String",

222

],

223

"version": { # A structure describing which components and their versions of the service

224

# are required in order to run the job.

225

"a_key": "", # Properties of the object.

226

},

227

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

228

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

229

# options are passed through the service and are used to recreate the

230

# SDK pipeline options on the worker in a language agnostic and platform

231

# independent way.

232

"a_key": "", # Properties of the object.

233

},

234

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

235

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

236

# specified in order for the job to have workers.

237

{ # Describes one particular pool of Cloud Dataflow workers to be

238

# instantiated by the Cloud Dataflow service in order to perform the

239

# computations required by a job. Note that a workflow job may use

240

# multiple pools, in order to match the various computational

241

# requirements of the various stages of the job.

242

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

243

# service will choose a number of threads (according to the number of cores

244

# on the selected machine type for batch, or 1 by convention for streaming).

245

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

246

# execute the job. If zero or unspecified, the service will

247

# attempt to choose a reasonable default.

248

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

249

# will attempt to choose a reasonable default.

250

"diskSourceImage": "A String", # Fully qualified source image for disks.

251

"packages": [ # Packages to be installed on workers.

252

{ # The packages that must be installed in order for a worker to run the

253

# steps of the Cloud Dataflow job that will be assigned to its worker

254

# pool.

255

#

256

# This is the mechanism by which the Cloud Dataflow SDK causes code to

257

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

258

# might use this to install jars containing the user's code and all of the

259

# various dependencies (libraries, data files, etc.) required in order

260

# for that code to run.

261

"name": "A String", # The name of the package.

262

"location": "A String", # The resource to read the package from. The supported resource type is:

263

#

264

# Google Cloud Storage:

265

#

266

# storage.googleapis.com/{bucket}

267

# bucket.storage.googleapis.com/

268

},

269

],

270

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

271

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

272

# `TEARDOWN_NEVER`.

273

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

274

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

275

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

276

# down.

277

#

278

# If the workers are not torn down by the service, they will

279

# continue to run and use Google Compute Engine VM resources in the

280

# user's project until they are explicitly terminated by the user.

281

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

282

# policy except for small, manually supervised test jobs.

283

#

284

# If unknown or unspecified, the service will attempt to choose a reasonable

285

# default.

286

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

287

# Compute Engine API.

288

"poolArgs": { # Extra arguments for this worker pool.

289

"a_key": "", # Properties of the object. Contains field @type with type URL.

290

},

291

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

292

# attempt to choose a reasonable default.

293

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

294

# harness, residing in Google Container Registry.

295

#

296

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

297

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

298

# attempt to choose a reasonable default.

299

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

300

# service will attempt to choose a reasonable default.

301

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

302

# are supported.

303

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

304

# only be set in the Fn API path. For non-cross-language pipelines this

305

# should have only one entry. Cross-language pipelines will have two or more

306

# entries.

307

{ # Defines a SDK harness container for executing Dataflow pipelines.

308

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

309

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

310

# container instance with this image. If false (or unset) recommends using

311

# more than one core per SDK container instance with this image for

312

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

317

{ # Describes the data disk used by a workflow job.

318

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

319

# must be a disk type appropriate to the project and zone in which

320

# the workers will run. If unknown or unspecified, the service

321

# will attempt to choose a reasonable default.

322

#

323

# For example, the standard persistent disk type is a resource name

324

# typically ending in "pd-standard". If SSD persistent disks are

325

# available, the resource name typically ends with "pd-ssd". The

326

# actual valid values are defined the Google Compute Engine API,

327

# not by the Cloud Dataflow API; consult the Google Compute Engine

328

# documentation for more information about determining the set of

329

# available disk types for a particular project and zone.

330

#

331

# Google Compute Engine Disk types are local to a particular

332

# project in a particular zone, and so the resource name will

333

# typically look something like this:

334

#

335

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

336

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

337

# attempt to choose a reasonable default.

338

"mountPoint": "A String", # Directory in a VM where disk is mounted.

339

},

340

],

341

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

342

# the form "regions/REGION/subnetworks/SUBNETWORK".

343

"ipConfiguration": "A String", # Configuration for VM IPs.

344

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

345

# using the standard Dataflow task runner. Users should ignore

346

# this field.

347

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

348

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

349

# taskrunner; e.g. "wheel".

350

"harnessCommand": "A String", # The command to launch the worker harness.

351

"logDir": "A String", # The directory on the VM to store logs.

352

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

353

# access the Cloud Dataflow API.

354

"A String",

355

],

356

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

357

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

358

# will not be uploaded.

359

#

360

# The supported resource type is:

361

#

362

# Google Cloud Storage:

363

# storage.googleapis.com/{bucket}/{object}

364

# bucket.storage.googleapis.com/{object}

365

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

366

"workflowFileName": "A String", # The file to store the workflow in.

367

"languageHint": "A String", # The suggested backend language.

368

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

369

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

370

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

371

# temporary storage.

372

#

373

# The supported resource type is:

374

#

375

# Google Cloud Storage:

376

# storage.googleapis.com/{bucket}/{object}

377

# bucket.storage.googleapis.com/{object}

378

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

379

#

380

# When workers access Google Cloud APIs, they logically do so via

381

# relative URLs. If this field is specified, it supplies the base

382

# URL to use for resolving these relative URLs. The normative

383

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

384

# Locators".

385

#

386

# If not specified, the default value is "http://www.googleapis.com/"

387

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

388

# console.

389

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

390

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

391

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

392

# storage.

393

#

394

# The supported resource type is:

395

#

396

# Google Cloud Storage:

397

#

398

# storage.googleapis.com/{bucket}/{object}

399

# bucket.storage.googleapis.com/{object}

400

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

401

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

402

#

403

# When workers access Google Cloud APIs, they logically do so via

404

# relative URLs. If this field is specified, it supplies the base

405

# URL to use for resolving these relative URLs. The normative

406

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

407

# Locators".

408

#

409

# If not specified, the default value is "http://www.googleapis.com/"

410

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

411

# "dataflow/v1b3/projects".

412

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

413

# "shuffle/v1beta1".

414

"workerId": "A String", # The ID of the worker running this pipeline.

415

},

416

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

417

# taskrunner; e.g. "root".

418

"vmId": "A String", # The ID string of the VM.

419

},

420

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

421

"algorithm": "A String", # The algorithm to use for autoscaling.

422

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

423

},

424

"metadata": { # Metadata to set on the Google Compute Engine VMs.

425

"a_key": "A String",

426

},

427

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

428

# select a default set of packages which are useful to worker

429

# harnesses written in a particular language.

430

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

431

# the service will use the network "default".

432

},

433

],

434

"dataset": "A String", # The dataset for the current project where various workflow

435

# related tables are stored.

436

#

437

# The supported resource type is:

438

#

439

# Google BigQuery:

440

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

441

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

442

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

443

# callers cannot mutate it.

444

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

445

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

446

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

447

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

448

},

449

],

450

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

451

# by the metadata values provided here. Populated for ListJobs and all GetJob

452

# views SUMMARY and higher.

453

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

454

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

455

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

456

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

457

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

458

},

459

],

460

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

461

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

462

"sdkSupportStatus": "A String", # The support status for this SDK version.

463

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

464

},

465

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

466

{ # Metadata for a BigQuery connector used by the job.

467

"table": "A String", # Table accessed in the connection.

468

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

469

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

470

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

471

},

472

],

473

"fileDetails": [ # Identification of a File source used in the Dataflow job.

474

{ # Metadata for a File connector used by the job.

475

"filePattern": "A String", # File Pattern used to access files by the connector.

476

},

477

],

478

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

479

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

480

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

481

"subscription": "A String", # Subscription used in the connection.

482

},

483

],

484

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

485

{ # Metadata for a BigTable connector used by the job.

486

"projectId": "A String", # ProjectId accessed in the connection.

487

"instanceId": "A String", # InstanceId accessed in the connection.

488

"tableId": "A String", # TableId accessed in the connection.

489

},

490

],

491

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

492

{ # Metadata for a Spanner connector used by the job.

493

"instanceId": "A String", # InstanceId accessed in the connection.

494

"projectId": "A String", # ProjectId accessed in the connection.

495

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

496

},

497

],

498

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

499

"type": "A String", # The type of Cloud Dataflow job.

500

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

501

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

502

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

503

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

504

# A description of the user pipeline and stages through which it is executed.

505

# Created by Cloud Dataflow service. Only retrieved with

506

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

507

# form. This data is provided by the Dataflow service for ease of visualizing

508

# the pipeline and interpreting Dataflow provided metrics.

509

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

510

{ # Description of the composing transforms, names/ids, and input/outputs of a

511

# stage of execution. Some composing transforms and sources may have been

512

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

513

"outputSource": [ # Output sources for this stage.

514

{ # Description of an input or output of an execution stage.

515

"sizeBytes": "A String", # Size of the source, if measurable.

516

"name": "A String", # Dataflow service generated name for this source.

517

"userName": "A String", # Human-readable name for this source; may be user or system generated.

518

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

519

# source is most closely associated.

520

},

521

],

522

"name": "A String", # Dataflow service generated name for this stage.

523

"inputSource": [ # Input sources for this stage.

524

{ # Description of an input or output of an execution stage.

525

"sizeBytes": "A String", # Size of the source, if measurable.

526

"name": "A String", # Dataflow service generated name for this source.

527

"userName": "A String", # Human-readable name for this source; may be user or system generated.

528

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

529

# source is most closely associated.

530

},

531

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

532

"id": "A String", # Dataflow service generated id for this stage.

533

"componentTransform": [ # Transforms that comprise this execution stage.

534

{ # Description of a transform executed as part of an execution stage.

535

"originalTransform": "A String", # User name for the original user transform with which this transform is

536

# most closely associated.

537

"name": "A String", # Dataflow service generated name for this source.

538

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

539

},

540

],

541

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

542

{ # Description of an interstitial value between transforms in an execution

543

# stage.

544

"name": "A String", # Dataflow service generated name for this source.

545

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

546

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

547

# source is most closely associated.

548

},

549

],

550

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

551

},

552

],

553

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

554

{ # Description of the type, names/ids, and input/outputs for a transform.

555

"kind": "A String", # Type of transform.

556

"inputCollectionName": [ # User names for all collection inputs to this transform.

557

"A String",

558

],

559

"name": "A String", # User provided name for this transform instance.

560

"id": "A String", # SDK generated id of this transform instance.

561

"displayData": [ # Transform-specific display data.

562

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

563

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

564

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

565

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

566

# language namespace (i.e. python module) which defines the display data.

567

# This allows a dax monitoring system to specially handle the data

568

# and perform custom rendering.

569

"floatValue": 3.14, # Contains value if the data is of float type.

570

"key": "A String", # The key identifying the display data.

571

# This is intended to be used as a label for the display data

572

# when viewed in a dax monitoring system.

573

"shortStrValue": "A String", # A possible additional shorter value to display.

574

# For example a java_class_name_value of com.mypackage.MyDoFn

575

# will be stored with MyDoFn as the short_str_value and

576

# com.mypackage.MyDoFn as the java_class_name value.

577

# short_str_value can be displayed and java_class_name_value

578

# will be displayed as a tooltip.

579

"url": "A String", # An optional full URL.

580

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

581

"timestampValue": "A String", # Contains value if the data is of timestamp type.

582

"boolValue": True or False, # Contains value if the data is of a boolean type.

583

"javaClassValue": "A String", # Contains value if the data is of java class type.

584

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

585

},

586

],

587

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

593

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

594

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

595

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

596

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

597

# language namespace (i.e. python module) which defines the display data.

598

# This allows a dax monitoring system to specially handle the data

599

# and perform custom rendering.

600

"floatValue": 3.14, # Contains value if the data is of float type.

601

"key": "A String", # The key identifying the display data.

602

# This is intended to be used as a label for the display data

603

# when viewed in a dax monitoring system.

604

"shortStrValue": "A String", # A possible additional shorter value to display.

605

# For example a java_class_name_value of com.mypackage.MyDoFn

606

# will be stored with MyDoFn as the short_str_value and

607

# com.mypackage.MyDoFn as the java_class_name value.

608

# short_str_value can be displayed and java_class_name_value

609

# will be displayed as a tooltip.

610

"url": "A String", # An optional full URL.

611

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

612

"timestampValue": "A String", # Contains value if the data is of timestamp type.

613

"boolValue": True or False, # Contains value if the data is of a boolean type.

614

"javaClassValue": "A String", # Contains value if the data is of java class type.

615

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

620

# of the job it replaced.

621

#

622

# When sending a `CreateJobRequest`, you can update a job by specifying it

623

# here. The job named here is stopped, and its intermediate state is

624

# transferred to this job.

625

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

626

# for temporary storage. These temporary files will be

627

# removed on job completion.

628

# No duplicates are allowed.

629

# No file patterns are supported.

630

#

631

# The supported files are:

632

#

633

# Google Cloud Storage:

634

#

635

# storage.googleapis.com/{bucket}/{object}

636

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

637

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

638

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

639

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

640

#

641

# Only one Job with a given name may exist in a project at any

642

# given time. If a caller attempts to create a Job with the same

643

# name as an already-existing Job, the attempt returns the

644

# existing Job.

645

#

646

# The name must match the regular expression

647

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

648

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

649

#

650

# The top-level steps that constitute the entire job.

651

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

652

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

653

# A job consists of multiple steps, each of which performs some

654

# specific operation as part of the overall job. Data is typically

655

# passed from one step to another as part of the job.

656

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

657

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

658

# Map-Reduce job:

659

#

660

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

661

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

662

#

663

# * Validate the elements.

664

#

665

# * Apply a user-defined function to map each element to some value

666

# and extract an element-specific key value.

667

#

668

# * Group elements with the same key into a single element with

669

# that key, transforming a multiply-keyed collection into a

670

# uniquely-keyed collection.

671

#

672

# * Write the elements out to some data sink.

673

#

674

# Note that the Cloud Dataflow service may be used to run many different

675

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

676

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

677

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

678

"kind": "A String", # The kind of step in the Cloud Dataflow job.

679

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

680

# predefined step has its own required set of properties.

681

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

682

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

683

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

684

},

685

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

686

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

687

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

688

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

689

# isn't contained in the submitted job.

690

"stages": { # A mapping from each stage to the information about that stage.

691

"a_key": { # Contains information about how a particular

692

# google.dataflow.v1beta3.Step will be executed.

693

"stepName": [ # The steps associated with the execution stage.

694

# Note that stages may have several steps, and that a given step

695

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

702

#

703

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

704

# specified.

705

#

706

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

707

# terminal state. After a job has reached a terminal state, no

708

# further state updates may be made.

709

#

710

# This field may be mutated by the Cloud Dataflow service;

711

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

712

"location": "A String", # The [regional endpoint]

713

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

714

# contains this job.

715

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

716

# Flexible resource scheduling jobs are started with some delay after job

717

# creation, so start_time is unset before start and is updated when the

718

# job is started by the Cloud Dataflow service. For other jobs, start_time

719

# always equals to create_time and is immutable and set by the Cloud Dataflow

720

# service.

721

"stepsLocation": "A String", # The GCS location where the steps are stored.

722

"labels": { # User-defined labels for this job.

723

#

724

# The labels map can contain no more than 64 entries. Entries of the labels

725

# map are UTF8 strings that comply with the following restrictions:

726

#

727

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

728

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

729

# * Both keys and values are additionally constrained to be <= 128 bytes in

730

# size.

731

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

732

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

733

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

734

# Cloud Dataflow service.

735

"requestedState": "A String", # The job's requested state.

736

#

737

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

738

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

739

# also be used to directly set a job's requested state to

740

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

741

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

746

<code class="details" id="get">get(projectId, location, view=None, gcsPath=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

747

<pre>Get the template associated with a template.

748

749

Args:

750

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

751

location: string, The [regional endpoint]

752

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

753

which to direct the request. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

754

view: string, The view to retrieve. Defaults to METADATA_ONLY.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

755

gcsPath: string, Required. A Cloud Storage path to the template from which to

756

create the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

757

Must be valid Cloud Storage URL, beginning with 'gs://'.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

758

x__xgafv: string, V1 error format.

759

Allowed values

760

1 - v1 error format

761

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

762

763

Returns:

764

An object of the form:

765

766

{ # The response to a GetTemplate request.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

767

"metadata": { # Metadata describing a template. # The template metadata describing the template name, available

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

768

# parameters, etc.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

769

"parameters": [ # The parameters for the template.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

770

{ # Metadata for a specific parameter.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

771

"label": "A String", # Required. The label to display for the parameter.

772

"paramType": "A String", # Optional. The type of the parameter.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

773

# Used for selecting input picker.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

774

"helpText": "A String", # Required. The help text to display for the parameter.

775

"name": "A String", # Required. The name of the parameter.

776

"regexes": [ # Optional. Regexes that the parameter must match.

777

"A String",

778

],

779

"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

780

},

781

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

782

"name": "A String", # Required. The name of the template.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

783

"description": "A String", # Optional. A description of the template.

784

},

785

"runtimeMetadata": { # RuntimeMetadata describing a runtime environment. # Describes the runtime metadata with SDKInfo and available parameters.

786

"sdkInfo": { # SDK Information. # SDK Info for the template.

787

"language": "A String", # Required. The SDK Language.

788

"version": "A String", # Optional. The SDK version.

789

},

790

"parameters": [ # The parameters for the template.

791

{ # Metadata for a specific parameter.

792

"label": "A String", # Required. The label to display for the parameter.

793

"paramType": "A String", # Optional. The type of the parameter.

794

# Used for selecting input picker.

795

"helpText": "A String", # Required. The help text to display for the parameter.

796

"name": "A String", # Required. The name of the parameter.

797

"regexes": [ # Optional. Regexes that the parameter must match.

798

"A String",

799

],

800

"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.

801

},

802

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

803

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

804

"templateType": "A String", # Template Type.

805

"status": { # The `Status` type defines a logical error model that is suitable for # The status of the get template request. Any problems with the

806

# request will be indicated in the error_details.

807

# different programming environments, including REST APIs and RPC APIs. It is

808

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

809

# three pieces of data: error code, error message, and error details.

810

#

811

# You can find out more about this error model and how to work with it in the

812

# [API Design Guide](https://cloud.google.com/apis/design/errors).

813

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

814

"message": "A String", # A developer-facing error message, which should be in English. Any

815

# user-facing error message should be localized and sent in the

816

# google.rpc.Status.details field, or localized by the client.

817

"details": [ # A list of messages that carry the error details. There is a common set of

818

# message types for APIs to use.

819

{

820

"a_key": "", # Properties of the object. Contains field @type with type URL.

821

},

822

],

823

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

828

<code class="details" id="launch">launch(projectId, location, body=None, validateOnly=None, gcsPath=None, dynamicTemplate_gcsPath=None, dynamicTemplate_stagingLocation=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

829

<pre>Launch a template.

830

831

Args:

832

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

833

location: string, The [regional endpoint]

834

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

835

which to direct the request. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

836

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

837

The object takes the form of:

838

839

{ # Parameters to provide to the template being launched.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

840

"parameters": { # The runtime parameters to pass to the job.

841

"a_key": "A String",

842

},

843

"jobName": "A String", # Required. The job name to use for the created job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

844

"transformNameMapping": { # Only applicable when updating a pipeline. Map of transform name prefixes of

845

# the job to be replaced to the corresponding name prefixes of the new job.

846

"a_key": "A String",

847

},

848

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

849

"machineType": "A String", # The machine type to use for the job. Defaults to the value from the

850

# template if not specified.

851

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

852

# the form "regions/REGION/subnetworks/SUBNETWORK".

853

"ipConfiguration": "A String", # Configuration for VM IPs.

854

"kmsKeyName": "A String", # Optional. Name for the Cloud KMS key for the job.

855

# Key format is:

856

# projects/<project>/locations/<location>/keyRings/<keyring>/cryptoKeys/<key>

857

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

858

# Must be a valid Cloud Storage URL, beginning with `gs://`.

859

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

860

# Use with caution.

861

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

862

# the service will use the network "default".

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

863

"workerRegion": "A String", # The Compute Engine region

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

864

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

865

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

866

# with worker_zone. If neither worker_region nor worker_zone is specified,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

867

# default to the control plane's region.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

868

"zone": "A String", # The Compute Engine [availability

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

869

# zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

870

# for launching worker instances to run your pipeline.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

871

# In the future, worker_zone will take precedence.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

872

"numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

873

"workerZone": "A String", # The Compute Engine zone

874

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

875

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

876

# with worker_region. If neither worker_region nor worker_zone is specified,

877

# a zone in the control plane's region is chosen based on available capacity.

878

# If both `worker_zone` and `zone` are set, `worker_zone` takes precedence.

879

"additionalUserLabels": { # Additional user labels to be specified for the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

880

# Keys and values should follow the restrictions specified in the [labeling

881

# restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)

882

# page.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

883

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

884

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

885

"additionalExperiments": [ # Additional experiment flags for the job.

886

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

887

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

888

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

889

# available to your pipeline during execution, from 1 to 1000.

890

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

891

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

892

"update": True or False, # If set, replace the existing pipeline with the name specified by jobName

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

893

# with this pipeline, preserving state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

894

}

895

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

896

validateOnly: boolean, If true, the request is validated but not actually executed.

897

Defaults to false.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

898

gcsPath: string, A Cloud Storage path to the template from which to create

899

the job.

900

Must be valid Cloud Storage URL, beginning with 'gs://'.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

901

dynamicTemplate_gcsPath: string, Path to dynamic template spec file on GCS.

902

The file must be a Json serialized DynamicTemplateFieSpec object.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

903

dynamicTemplate_stagingLocation: string, Cloud Storage path for staging dependencies.

904

Must be a valid Cloud Storage URL, beginning with `gs://`.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

905

x__xgafv: string, V1 error format.

906

Allowed values

907

1 - v1 error format

908

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

909

910

Returns:

911

An object of the form:

912

913

{ # Response to the request to launch a template.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

914

"job": { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

915

# the job was successfully launched.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

916

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

917

# If this field is set, the service will ensure its uniqueness.

918

# The request to create a job will fail if the service has knowledge of a

919

# previously submitted job with the same client's ID and job name.

920

# The caller may use this field to ensure idempotence of job

921

# creation across retried attempts to create a job.

922

# By default, the field is empty and, in that case, the service ignores it.

923

"id": "A String", # The unique ID of this job.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

924

#

925

# This field is set by the Cloud Dataflow service when the Job is

926

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

927

"currentStateTime": "A String", # The timestamp associated with the current state.

928

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

929

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

930

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

931

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

932

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

933

"internalExperiments": { # Experimental settings.

934

"a_key": "", # Properties of the object. Contains field @type with type URL.

935

},

936

"workerRegion": "A String", # The Compute Engine region

937

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

938

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

939

# with worker_zone. If neither worker_region nor worker_zone is specified,

940

# default to the control plane's region.

941

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

942

# at rest, AKA a Customer Managed Encryption Key (CMEK).

943

#

944

# Format:

945

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

946

"userAgent": { # A description of the process that generated the request.

947

"a_key": "", # Properties of the object.

948

},

949

"workerZone": "A String", # The Compute Engine zone

950

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

951

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

952

# with worker_region. If neither worker_region nor worker_zone is specified,

953

# a zone in the control plane's region is chosen based on available capacity.

954

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

955

# unspecified, the service will attempt to choose a reasonable

956

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

957

# e.g. "compute.googleapis.com".

958

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

959

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

960

# this resource prefix, where {JOBNAME} is the value of the

961

# job_name field. The resulting bucket and object prefix is used

962

# as the prefix of the resources used to store temporary data

963

# needed during the job execution. NOTE: This will override the

964

# value in taskrunner_settings.

965

# The supported resource type is:

966

#

967

# Google Cloud Storage:

968

#

969

# storage.googleapis.com/{bucket}/{object}

970

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

971

"experiments": [ # The list of experiments to enable.

972

"A String",

973

],

974

"version": { # A structure describing which components and their versions of the service

975

# are required in order to run the job.

976

"a_key": "", # Properties of the object.

977

},

978

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

979

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

980

# options are passed through the service and are used to recreate the

981

# SDK pipeline options on the worker in a language agnostic and platform

982

# independent way.

983

"a_key": "", # Properties of the object.

984

},

985

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

986

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

987

# specified in order for the job to have workers.

988

{ # Describes one particular pool of Cloud Dataflow workers to be

989

# instantiated by the Cloud Dataflow service in order to perform the

990

# computations required by a job. Note that a workflow job may use

991

# multiple pools, in order to match the various computational

992

# requirements of the various stages of the job.

993

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

994

# service will choose a number of threads (according to the number of cores

995

# on the selected machine type for batch, or 1 by convention for streaming).

996

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

997

# execute the job. If zero or unspecified, the service will

998

# attempt to choose a reasonable default.

999

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1000

# will attempt to choose a reasonable default.

1001

"diskSourceImage": "A String", # Fully qualified source image for disks.

1002

"packages": [ # Packages to be installed on workers.

1003

{ # The packages that must be installed in order for a worker to run the

1004

# steps of the Cloud Dataflow job that will be assigned to its worker

1005

# pool.

1006

#

1007

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1008

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1009

# might use this to install jars containing the user's code and all of the

1010

# various dependencies (libraries, data files, etc.) required in order

1011

# for that code to run.

1012

"name": "A String", # The name of the package.

1013

"location": "A String", # The resource to read the package from. The supported resource type is:

1014

#

1015

# Google Cloud Storage:

1016

#

1017

# storage.googleapis.com/{bucket}

1018

# bucket.storage.googleapis.com/

1019

},

1020

],

1021

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1022

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1023

# `TEARDOWN_NEVER`.

1024

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1025

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1026

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1027

# down.

1028

#

1029

# If the workers are not torn down by the service, they will

1030

# continue to run and use Google Compute Engine VM resources in the

1031

# user's project until they are explicitly terminated by the user.

1032

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1033

# policy except for small, manually supervised test jobs.

1034

#

1035

# If unknown or unspecified, the service will attempt to choose a reasonable

1036

# default.

1037

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1038

# Compute Engine API.

1039

"poolArgs": { # Extra arguments for this worker pool.

1040

"a_key": "", # Properties of the object. Contains field @type with type URL.

1041

},

1042

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1043

# attempt to choose a reasonable default.

1044

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1045

# harness, residing in Google Container Registry.

1046

#

1047

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1048

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1049

# attempt to choose a reasonable default.

1050

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1051

# service will attempt to choose a reasonable default.

1052

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1053

# are supported.

1054

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1055

# only be set in the Fn API path. For non-cross-language pipelines this

1056

# should have only one entry. Cross-language pipelines will have two or more

1057

# entries.

1058

{ # Defines a SDK harness container for executing Dataflow pipelines.

1059

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1060

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1061

# container instance with this image. If false (or unset) recommends using

1062

# more than one core per SDK container instance with this image for

1063

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1068

{ # Describes the data disk used by a workflow job.

1069

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1070

# must be a disk type appropriate to the project and zone in which

1071

# the workers will run. If unknown or unspecified, the service

1072

# will attempt to choose a reasonable default.

1073

#

1074

# For example, the standard persistent disk type is a resource name

1075

# typically ending in "pd-standard". If SSD persistent disks are

1076

# available, the resource name typically ends with "pd-ssd". The

1077

# actual valid values are defined the Google Compute Engine API,

1078

# not by the Cloud Dataflow API; consult the Google Compute Engine

1079

# documentation for more information about determining the set of

1080

# available disk types for a particular project and zone.

1081

#

1082

# Google Compute Engine Disk types are local to a particular

1083

# project in a particular zone, and so the resource name will

1084

# typically look something like this:

1085

#

1086

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1087

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1088

# attempt to choose a reasonable default.

1089

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1090

},

1091

],

1092

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1093

# the form "regions/REGION/subnetworks/SUBNETWORK".

1094

"ipConfiguration": "A String", # Configuration for VM IPs.

1095

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1096

# using the standard Dataflow task runner. Users should ignore

1097

# this field.

1098

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1099

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1100

# taskrunner; e.g. "wheel".

1101

"harnessCommand": "A String", # The command to launch the worker harness.

1102

"logDir": "A String", # The directory on the VM to store logs.

1103

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1104

# access the Cloud Dataflow API.

1105

"A String",

1106

],

1107

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1108

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1109

# will not be uploaded.

1110

#

1111

# The supported resource type is:

1112

#

1113

# Google Cloud Storage:

1114

# storage.googleapis.com/{bucket}/{object}

1115

# bucket.storage.googleapis.com/{object}

1116

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1117

"workflowFileName": "A String", # The file to store the workflow in.

1118

"languageHint": "A String", # The suggested backend language.

1119

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1120

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1121

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1122

# temporary storage.

1123

#

1124

# The supported resource type is:

1125

#

1126

# Google Cloud Storage:

1127

# storage.googleapis.com/{bucket}/{object}

1128

# bucket.storage.googleapis.com/{object}

1129

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1130

#

1131

# When workers access Google Cloud APIs, they logically do so via

1132

# relative URLs. If this field is specified, it supplies the base

1133

# URL to use for resolving these relative URLs. The normative

1134

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1135

# Locators".

1136

#

1137

# If not specified, the default value is "http://www.googleapis.com/"

1138

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1139

# console.

1140

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1141

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1142

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1143

# storage.

1144

#

1145

# The supported resource type is:

1146

#

1147

# Google Cloud Storage:

1148

#

1149

# storage.googleapis.com/{bucket}/{object}

1150

# bucket.storage.googleapis.com/{object}

1151

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1152

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1153

#

1154

# When workers access Google Cloud APIs, they logically do so via

1155

# relative URLs. If this field is specified, it supplies the base

1156

# URL to use for resolving these relative URLs. The normative

1157

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1158

# Locators".

1159

#

1160

# If not specified, the default value is "http://www.googleapis.com/"

1161

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1162

# "dataflow/v1b3/projects".

1163

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1164

# "shuffle/v1beta1".

1165

"workerId": "A String", # The ID of the worker running this pipeline.

1166

},

1167

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1168

# taskrunner; e.g. "root".

1169

"vmId": "A String", # The ID string of the VM.

1170

},

1171

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1172

"algorithm": "A String", # The algorithm to use for autoscaling.

1173

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1174

},

1175

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1176

"a_key": "A String",

1177

},

1178

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1179

# select a default set of packages which are useful to worker

1180

# harnesses written in a particular language.

1181

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1182

# the service will use the network "default".

1183

},

1184

],

1185

"dataset": "A String", # The dataset for the current project where various workflow

1186

# related tables are stored.

1187

#

1188

# The supported resource type is:

1189

#

1190

# Google BigQuery:

1191

# bigquery.googleapis.com/{dataset}

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1192

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1193

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1194

# callers cannot mutate it.

1195

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1196

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1197

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1198

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1199

},

1200

],

1201

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1202

# by the metadata values provided here. Populated for ListJobs and all GetJob

1203

# views SUMMARY and higher.

1204

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1205

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1206

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1207

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1208

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1209

},

1210

],

1211

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1212

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1213

"sdkSupportStatus": "A String", # The support status for this SDK version.

1214

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1215

},

1216

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1217

{ # Metadata for a BigQuery connector used by the job.

1218

"table": "A String", # Table accessed in the connection.

1219

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1220

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1221

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1222

},

1223

],

1224

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1225

{ # Metadata for a File connector used by the job.

1226

"filePattern": "A String", # File Pattern used to access files by the connector.

1227

},

1228

],

1229

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1230

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1231

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1232

"subscription": "A String", # Subscription used in the connection.

1233

},

1234

],

1235

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1236

{ # Metadata for a BigTable connector used by the job.

1237

"projectId": "A String", # ProjectId accessed in the connection.

1238

"instanceId": "A String", # InstanceId accessed in the connection.

1239

"tableId": "A String", # TableId accessed in the connection.

1240

},

1241

],

1242

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1243

{ # Metadata for a Spanner connector used by the job.

1244

"instanceId": "A String", # InstanceId accessed in the connection.

1245

"projectId": "A String", # ProjectId accessed in the connection.

1246

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1247

},

1248

],

1249

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1250

"type": "A String", # The type of Cloud Dataflow job.

1251

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1252

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1253

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1254

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1255

# A description of the user pipeline and stages through which it is executed.

1256

# Created by Cloud Dataflow service. Only retrieved with

1257

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1258

# form. This data is provided by the Dataflow service for ease of visualizing

1259

# the pipeline and interpreting Dataflow provided metrics.

1260

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1261

{ # Description of the composing transforms, names/ids, and input/outputs of a

1262

# stage of execution. Some composing transforms and sources may have been

1263

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1264

"outputSource": [ # Output sources for this stage.

1265

{ # Description of an input or output of an execution stage.

1266

"sizeBytes": "A String", # Size of the source, if measurable.

1267

"name": "A String", # Dataflow service generated name for this source.

1268

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1269

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1270

# source is most closely associated.

1271

},

1272

],

1273

"name": "A String", # Dataflow service generated name for this stage.

1274

"inputSource": [ # Input sources for this stage.

1275

{ # Description of an input or output of an execution stage.

1276

"sizeBytes": "A String", # Size of the source, if measurable.

1277

"name": "A String", # Dataflow service generated name for this source.

1278

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1279

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1280

# source is most closely associated.

1281

},

1282

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1283

"id": "A String", # Dataflow service generated id for this stage.

1284

"componentTransform": [ # Transforms that comprise this execution stage.

1285

{ # Description of a transform executed as part of an execution stage.

1286

"originalTransform": "A String", # User name for the original user transform with which this transform is

1287

# most closely associated.

1288

"name": "A String", # Dataflow service generated name for this source.

1289

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1290

},

1291

],

1292

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1293

{ # Description of an interstitial value between transforms in an execution

1294

# stage.

1295

"name": "A String", # Dataflow service generated name for this source.

1296

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1297

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1298

# source is most closely associated.

1299

},

1300

],

1301

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1302

},

1303

],

1304

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1305

{ # Description of the type, names/ids, and input/outputs for a transform.

1306

"kind": "A String", # Type of transform.

1307

"inputCollectionName": [ # User names for all collection inputs to this transform.

1308

"A String",

1309

],

1310

"name": "A String", # User provided name for this transform instance.

1311

"id": "A String", # SDK generated id of this transform instance.

1312

"displayData": [ # Transform-specific display data.

1313

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1314

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1315

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1316

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1317

# language namespace (i.e. python module) which defines the display data.

1318

# This allows a dax monitoring system to specially handle the data

1319

# and perform custom rendering.

1320

"floatValue": 3.14, # Contains value if the data is of float type.

1321

"key": "A String", # The key identifying the display data.

1322

# This is intended to be used as a label for the display data

1323

# when viewed in a dax monitoring system.

1324

"shortStrValue": "A String", # A possible additional shorter value to display.

1325

# For example a java_class_name_value of com.mypackage.MyDoFn

1326

# will be stored with MyDoFn as the short_str_value and

1327

# com.mypackage.MyDoFn as the java_class_name value.

1328

# short_str_value can be displayed and java_class_name_value

1329

# will be displayed as a tooltip.

1330

"url": "A String", # An optional full URL.

1331

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1332

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1333

"boolValue": True or False, # Contains value if the data is of a boolean type.

1334

"javaClassValue": "A String", # Contains value if the data is of java class type.

1335

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1336

},

1337

],

1338

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1344

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1345

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1346

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1347

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1348

# language namespace (i.e. python module) which defines the display data.

1349

# This allows a dax monitoring system to specially handle the data

1350

# and perform custom rendering.

1351

"floatValue": 3.14, # Contains value if the data is of float type.

1352

"key": "A String", # The key identifying the display data.

1353

# This is intended to be used as a label for the display data

1354

# when viewed in a dax monitoring system.

1355

"shortStrValue": "A String", # A possible additional shorter value to display.

1356

# For example a java_class_name_value of com.mypackage.MyDoFn

1357

# will be stored with MyDoFn as the short_str_value and

1358

# com.mypackage.MyDoFn as the java_class_name value.

1359

# short_str_value can be displayed and java_class_name_value

1360

# will be displayed as a tooltip.

1361

"url": "A String", # An optional full URL.

1362

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1363

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1364

"boolValue": True or False, # Contains value if the data is of a boolean type.

1365

"javaClassValue": "A String", # Contains value if the data is of java class type.

1366

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1371

# of the job it replaced.

1372

#

1373

# When sending a `CreateJobRequest`, you can update a job by specifying it

1374

# here. The job named here is stopped, and its intermediate state is

1375

# transferred to this job.

1376

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1377

# for temporary storage. These temporary files will be

1378

# removed on job completion.

1379

# No duplicates are allowed.

1380

# No file patterns are supported.

1381

#

1382

# The supported files are:

1383

#

1384

# Google Cloud Storage:

1385

#

1386

# storage.googleapis.com/{bucket}/{object}

1387

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1388

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1389

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1390

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1391

#

1392

# Only one Job with a given name may exist in a project at any

1393

# given time. If a caller attempts to create a Job with the same

1394

# name as an already-existing Job, the attempt returns the

1395

# existing Job.

1396

#

1397

# The name must match the regular expression

1398

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1399

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1400

#

1401

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1402

{ # Defines a particular step within a Cloud Dataflow job.

1403

#

1404

# A job consists of multiple steps, each of which performs some

1405

# specific operation as part of the overall job. Data is typically

1406

# passed from one step to another as part of the job.

1407

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1408

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1409

# Map-Reduce job:

1410

#

1411

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1412

# collection's elements.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1413

#

1414

# * Validate the elements.

1415

#

1416

# * Apply a user-defined function to map each element to some value

1417

# and extract an element-specific key value.

1418

#

1419

# * Group elements with the same key into a single element with

1420

# that key, transforming a multiply-keyed collection into a

1421

# uniquely-keyed collection.

1422

#

1423

# * Write the elements out to some data sink.

1424

#

1425

# Note that the Cloud Dataflow service may be used to run many different

1426

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1427

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1428

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1429

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1430

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1431

# predefined step has its own required set of properties.

1432

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1433

"a_key": "", # Properties of the object.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1434

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1435

},

1436

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1437

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1438

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1439

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1440

# isn't contained in the submitted job.

1441

"stages": { # A mapping from each stage to the information about that stage.

1442

"a_key": { # Contains information about how a particular

1443

# google.dataflow.v1beta3.Step will be executed.

1444

"stepName": [ # The steps associated with the execution stage.

1445

# Note that stages may have several steps, and that a given step

1446

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1453

#

1454

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1455

# specified.

1456

#

1457

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1458

# terminal state. After a job has reached a terminal state, no

1459

# further state updates may be made.

1460

#

1461

# This field may be mutated by the Cloud Dataflow service;

1462

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1463

"location": "A String", # The [regional endpoint]

1464

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1465

# contains this job.

1466

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1467

# Flexible resource scheduling jobs are started with some delay after job

1468

# creation, so start_time is unset before start and is updated when the

1469

# job is started by the Cloud Dataflow service. For other jobs, start_time

1470

# always equals to create_time and is immutable and set by the Cloud Dataflow

1471

# service.

1472

"stepsLocation": "A String", # The GCS location where the steps are stored.

1473

"labels": { # User-defined labels for this job.

1474

#

1475

# The labels map can contain no more than 64 entries. Entries of the labels

1476

# map are UTF8 strings that comply with the following restrictions:

1477

#

1478

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1479

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1480

# * Both keys and values are additionally constrained to be <= 128 bytes in

1481

# size.

1482

"a_key": "A String",

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1483

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1484

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1485

# Cloud Dataflow service.

1486

"requestedState": "A String", # The job's requested state.

1487

#

1488

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1489

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1490

# also be used to directly set a job's requested state to

1491

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1492

# job if it has not already reached a terminal state.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1493

},

Sai Cheemalapati