Blame - docs/dyn/dataflow_v1b3.projects.locations.templates.html - platform/external/python/google-api-python-client

2017-06-06 18:46:08 -0400

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

78

<code><a href="#create">create(projectId, location, body=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

79

<p class="firstline">Creates a Cloud Dataflow job from a template.</p>

80

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

81

<code><a href="#get">get(projectId, location, gcsPath=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

82

<p class="firstline">Get the template associated with a template.</p>

83

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

84

<code><a href="#launch">launch(projectId, location, body=None, dynamicTemplate_gcsPath=None, dynamicTemplate_stagingLocation=None, validateOnly=None, gcsPath=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

85

<p class="firstline">Launch a template.</p>

86

<h3>Method Details</h3>

87

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

88

<code class="details" id="create">create(projectId, location, body=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

89

<pre>Creates a Cloud Dataflow job from a template.

90

91

Args:

92

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

93

location: string, The [regional endpoint]

94

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

95

which to direct the request. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

96

body: object, The request body.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

97

The object takes the form of:

98

99

{ # A request to create a Cloud Dataflow job from a template.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

100

"location": "A String", # The [regional endpoint]

101

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

102

# which to direct the request.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

103

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

104

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

105

# Use with caution.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

106

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

107

# Must be a valid Cloud Storage URL, beginning with `gs://`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

108

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

109

# the service will use the network "default".

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

110

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

111

# the form "regions/REGION/subnetworks/SUBNETWORK".

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

112

"workerRegion": "A String", # The Compute Engine region

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

113

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

114

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

115

# with worker_zone. If neither worker_region nor worker_zone is specified,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

116

# default to the control plane's region.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

117

"numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job.

118

"additionalExperiments": [ # Additional experiment flags for the job.

119

"A String",

120

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

121

"zone": "A String", # The Compute Engine [availability

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

122

# zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

123

# for launching worker instances to run your pipeline.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

124

# In the future, worker_zone will take precedence.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

125

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

126

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

127

# available to your pipeline during execution, from 1 to 1000.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

128

"workerZone": "A String", # The Compute Engine zone

129

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

130

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

131

# with worker_region. If neither worker_region nor worker_zone is specified,

132

# a zone in the control plane's region is chosen based on available capacity.

133

# If both `worker_zone` and `zone` are set, `worker_zone` takes precedence.

134

"additionalUserLabels": { # Additional user labels to be specified for the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

135

# Keys and values should follow the restrictions specified in the [labeling

136

# restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)

137

# page.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

138

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

139

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

140

"machineType": "A String", # The machine type to use for the job. Defaults to the value from the

141

# template if not specified.

142

"ipConfiguration": "A String", # Configuration for VM IPs.

143

"kmsKeyName": "A String", # Optional. Name for the Cloud KMS key for the job.

144

# Key format is:

145

# projects/<project>/locations/<location>/keyRings/<keyring>/cryptoKeys/<key>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

146

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

147

"gcsPath": "A String", # Required. A Cloud Storage path to the template from which to

148

# create the job.

149

# Must be a valid Cloud Storage URL, beginning with `gs://`.

150

"jobName": "A String", # Required. The job name to use for the created job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

151

"parameters": { # The runtime parameters to pass to the job.

152

"a_key": "A String",

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

153

},

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

154

}

155

156

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

163

164

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

165

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

166

# A description of the user pipeline and stages through which it is executed.

167

# Created by Cloud Dataflow service. Only retrieved with

168

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

169

# form. This data is provided by the Dataflow service for ease of visualizing

170

# the pipeline and interpreting Dataflow provided metrics.

171

"displayData": [ # Pipeline level display data.

172

{ # Data provided with a pipeline or transform to provide descriptive info.

173

"url": "A String", # An optional full URL.

174

"javaClassValue": "A String", # Contains value if the data is of java class type.

175

"timestampValue": "A String", # Contains value if the data is of timestamp type.

176

"durationValue": "A String", # Contains value if the data is of duration type.

177

"label": "A String", # An optional label to display in a dax UI for the element.

178

"key": "A String", # The key identifying the display data.

179

# This is intended to be used as a label for the display data

180

# when viewed in a dax monitoring system.

181

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

182

# language namespace (i.e. python module) which defines the display data.

183

# This allows a dax monitoring system to specially handle the data

184

# and perform custom rendering.

185

"floatValue": 3.14, # Contains value if the data is of float type.

186

"strValue": "A String", # Contains value if the data is of string type.

187

"int64Value": "A String", # Contains value if the data is of int64 type.

188

"boolValue": True or False, # Contains value if the data is of a boolean type.

189

"shortStrValue": "A String", # A possible additional shorter value to display.

190

# For example a java_class_name_value of com.mypackage.MyDoFn

191

# will be stored with MyDoFn as the short_str_value and

192

# com.mypackage.MyDoFn as the java_class_name value.

193

# short_str_value can be displayed and java_class_name_value

194

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

195

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

196

],

197

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

198

{ # Description of the type, names/ids, and input/outputs for a transform.

199

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

200

"A String",

201

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

202

"displayData": [ # Transform-specific display data.

203

{ # Data provided with a pipeline or transform to provide descriptive info.

204

"url": "A String", # An optional full URL.

205

"javaClassValue": "A String", # Contains value if the data is of java class type.

206

"timestampValue": "A String", # Contains value if the data is of timestamp type.

207

"durationValue": "A String", # Contains value if the data is of duration type.

208

"label": "A String", # An optional label to display in a dax UI for the element.

209

"key": "A String", # The key identifying the display data.

210

# This is intended to be used as a label for the display data

211

# when viewed in a dax monitoring system.

212

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

213

# language namespace (i.e. python module) which defines the display data.

214

# This allows a dax monitoring system to specially handle the data

215

# and perform custom rendering.

216

"floatValue": 3.14, # Contains value if the data is of float type.

217

"strValue": "A String", # Contains value if the data is of string type.

218

"int64Value": "A String", # Contains value if the data is of int64 type.

219

"boolValue": True or False, # Contains value if the data is of a boolean type.

220

"shortStrValue": "A String", # A possible additional shorter value to display.

221

# For example a java_class_name_value of com.mypackage.MyDoFn

222

# will be stored with MyDoFn as the short_str_value and

223

# com.mypackage.MyDoFn as the java_class_name value.

224

# short_str_value can be displayed and java_class_name_value

225

# will be displayed as a tooltip.

226

},

227

],

228

"id": "A String", # SDK generated id of this transform instance.

229

"inputCollectionName": [ # User names for all collection inputs to this transform.

230

"A String",

231

],

232

"name": "A String", # User provided name for this transform instance.

233

"kind": "A String", # Type of transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

234

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

235

],

236

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

237

{ # Description of the composing transforms, names/ids, and input/outputs of a

238

# stage of execution. Some composing transforms and sources may have been

239

# generated by the Dataflow service during execution planning.

240

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

241

{ # Description of an interstitial value between transforms in an execution

242

# stage.

243

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

244

"name": "A String", # Dataflow service generated name for this source.

245

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

246

# source is most closely associated.

247

},

248

],

249

"inputSource": [ # Input sources for this stage.

250

{ # Description of an input or output of an execution stage.

251

"userName": "A String", # Human-readable name for this source; may be user or system generated.

252

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

253

# source is most closely associated.

254

"sizeBytes": "A String", # Size of the source, if measurable.

255

"name": "A String", # Dataflow service generated name for this source.

256

},

257

],

258

"name": "A String", # Dataflow service generated name for this stage.

259

"componentTransform": [ # Transforms that comprise this execution stage.

260

{ # Description of a transform executed as part of an execution stage.

261

"name": "A String", # Dataflow service generated name for this source.

262

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

263

"originalTransform": "A String", # User name for the original user transform with which this transform is

264

# most closely associated.

265

},

266

],

267

"id": "A String", # Dataflow service generated id for this stage.

268

"outputSource": [ # Output sources for this stage.

269

{ # Description of an input or output of an execution stage.

270

"userName": "A String", # Human-readable name for this source; may be user or system generated.

271

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

272

# source is most closely associated.

273

"sizeBytes": "A String", # Size of the source, if measurable.

274

"name": "A String", # Dataflow service generated name for this source.

275

},

276

],

277

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

278

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

279

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

280

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

281

"labels": { # User-defined labels for this job.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

282

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

283

# The labels map can contain no more than 64 entries. Entries of the labels

284

# map are UTF8 strings that comply with the following restrictions:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

285

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

286

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

287

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

288

# * Both keys and values are additionally constrained to be <= 128 bytes in

289

# size.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

290

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

291

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

292

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

293

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

294

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

295

"workerRegion": "A String", # The Compute Engine region

296

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

297

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

298

# with worker_zone. If neither worker_region nor worker_zone is specified,

299

# default to the control plane's region.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

300

"userAgent": { # A description of the process that generated the request.

301

"a_key": "", # Properties of the object.

302

},

303

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

304

"version": { # A structure describing which components and their versions of the service

305

# are required in order to run the job.

306

"a_key": "", # Properties of the object.

307

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

308

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

309

# at rest, AKA a Customer Managed Encryption Key (CMEK).

310

#

311

# Format:

312

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

313

"experiments": [ # The list of experiments to enable.

314

"A String",

315

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

316

"workerZone": "A String", # The Compute Engine zone

317

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

318

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

319

# with worker_region. If neither worker_region nor worker_zone is specified,

320

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

321

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

322

# specified in order for the job to have workers.

323

{ # Describes one particular pool of Cloud Dataflow workers to be

324

# instantiated by the Cloud Dataflow service in order to perform the

325

# computations required by a job. Note that a workflow job may use

326

# multiple pools, in order to match the various computational

327

# requirements of the various stages of the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

328

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

329

# Compute Engine API.

330

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

331

# only be set in the Fn API path. For non-cross-language pipelines this

332

# should have only one entry. Cross-language pipelines will have two or more

333

# entries.

334

{ # Defines a SDK harness container for executing Dataflow pipelines.

335

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

336

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

337

# container instance with this image. If false (or unset) recommends using

338

# more than one core per SDK container instance with this image for

339

# efficiency. Note that Dataflow service may choose to override this property

340

# if needed.

341

},

342

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

343

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

344

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

345

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

346

# are supported.

347

"metadata": { # Metadata to set on the Google Compute Engine VMs.

348

"a_key": "A String",

349

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

350

"diskSourceImage": "A String", # Fully qualified source image for disks.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

351

"dataDisks": [ # Data disks that are used by a VM in this workflow.

352

{ # Describes the data disk used by a workflow job.

353

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

354

# attempt to choose a reasonable default.

355

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

356

# must be a disk type appropriate to the project and zone in which

357

# the workers will run. If unknown or unspecified, the service

358

# will attempt to choose a reasonable default.

359

#

360

# For example, the standard persistent disk type is a resource name

361

# typically ending in "pd-standard". If SSD persistent disks are

362

# available, the resource name typically ends with "pd-ssd". The

363

# actual valid values are defined the Google Compute Engine API,

364

# not by the Cloud Dataflow API; consult the Google Compute Engine

365

# documentation for more information about determining the set of

366

# available disk types for a particular project and zone.

367

#

368

# Google Compute Engine Disk types are local to a particular

369

# project in a particular zone, and so the resource name will

370

# typically look something like this:

371

#

372

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

373

"mountPoint": "A String", # Directory in a VM where disk is mounted.

374

},

375

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

376

"packages": [ # Packages to be installed on workers.

377

{ # The packages that must be installed in order for a worker to run the

378

# steps of the Cloud Dataflow job that will be assigned to its worker

379

# pool.

380

#

381

# This is the mechanism by which the Cloud Dataflow SDK causes code to

382

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

383

# might use this to install jars containing the user's code and all of the

384

# various dependencies (libraries, data files, etc.) required in order

385

# for that code to run.

386

"name": "A String", # The name of the package.

387

"location": "A String", # The resource to read the package from. The supported resource type is:

388

#

389

# Google Cloud Storage:

390

#

391

# storage.googleapis.com/{bucket}

392

# bucket.storage.googleapis.com/

393

},

394

],

395

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

396

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

397

# `TEARDOWN_NEVER`.

398

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

399

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

400

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

401

# down.

402

#

403

# If the workers are not torn down by the service, they will

404

# continue to run and use Google Compute Engine VM resources in the

405

# user's project until they are explicitly terminated by the user.

406

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

407

# policy except for small, manually supervised test jobs.

408

#

409

# If unknown or unspecified, the service will attempt to choose a reasonable

410

# default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

411

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

412

# the service will use the network "default".

413

"ipConfiguration": "A String", # Configuration for VM IPs.

414

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

415

# attempt to choose a reasonable default.

416

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

417

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

418

"algorithm": "A String", # The algorithm to use for autoscaling.

419

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

420

"poolArgs": { # Extra arguments for this worker pool.

421

"a_key": "", # Properties of the object. Contains field @type with type URL.

422

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

423

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

424

# the form "regions/REGION/subnetworks/SUBNETWORK".

425

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

426

# execute the job. If zero or unspecified, the service will

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

427

# attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

428

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

429

# service will choose a number of threads (according to the number of cores

430

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

431

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

432

# harness, residing in Google Container Registry.

433

#

434

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

435

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

436

# using the standard Dataflow task runner. Users should ignore

437

# this field.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

438

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

439

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

440

# access the Cloud Dataflow API.

441

"A String",

442

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

443

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

444

#

445

# When workers access Google Cloud APIs, they logically do so via

446

# relative URLs. If this field is specified, it supplies the base

447

# URL to use for resolving these relative URLs. The normative

448

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

449

# Locators".

450

#

451

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

452

"workflowFileName": "A String", # The file to store the workflow in.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

453

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

454

# console.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

455

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

456

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

457

# taskrunner; e.g. "root".

458

"vmId": "A String", # The ID string of the VM.

459

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

460

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

461

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

462

# "shuffle/v1beta1".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

463

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

464

# storage.

465

#

466

# The supported resource type is:

467

#

468

# Google Cloud Storage:

469

#

470

# storage.googleapis.com/{bucket}/{object}

471

# bucket.storage.googleapis.com/{object}

472

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

473

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

474

# "dataflow/v1b3/projects".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

475

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

476

#

477

# When workers access Google Cloud APIs, they logically do so via

478

# relative URLs. If this field is specified, it supplies the base

479

# URL to use for resolving these relative URLs. The normative

480

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

481

# Locators".

482

#

483

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

484

"workerId": "A String", # The ID of the worker running this pipeline.

485

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

486

"harnessCommand": "A String", # The command to launch the worker harness.

487

"logDir": "A String", # The directory on the VM to store logs.

488

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

489

"languageHint": "A String", # The suggested backend language.

490

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

491

# taskrunner; e.g. "wheel".

492

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

493

# will not be uploaded.

494

#

495

# The supported resource type is:

496

#

497

# Google Cloud Storage:

498

# storage.googleapis.com/{bucket}/{object}

499

# bucket.storage.googleapis.com/{object}

500

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

501

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

502

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

503

# temporary storage.

504

#

505

# The supported resource type is:

506

#

507

# Google Cloud Storage:

508

# storage.googleapis.com/{bucket}/{object}

509

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

510

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

511

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

512

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

513

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

514

# select a default set of packages which are useful to worker

515

# harnesses written in a particular language.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

516

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

517

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

518

},

519

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

520

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

521

# storage. The system will append the suffix "/temp-{JOBNAME} to

522

# this resource prefix, where {JOBNAME} is the value of the

523

# job_name field. The resulting bucket and object prefix is used

524

# as the prefix of the resources used to store temporary data

525

# needed during the job execution. NOTE: This will override the

526

# value in taskrunner_settings.

527

# The supported resource type is:

528

#

529

# Google Cloud Storage:

530

#

531

# storage.googleapis.com/{bucket}/{object}

532

# bucket.storage.googleapis.com/{object}

533

"internalExperiments": { # Experimental settings.

534

"a_key": "", # Properties of the object. Contains field @type with type URL.

535

},

536

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

537

# options are passed through the service and are used to recreate the

538

# SDK pipeline options on the worker in a language agnostic and platform

539

# independent way.

540

"a_key": "", # Properties of the object.

541

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

542

"dataset": "A String", # The dataset for the current project where various workflow

543

# related tables are stored.

544

#

545

# The supported resource type is:

546

#

547

# Google BigQuery:

548

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

549

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

550

# unspecified, the service will attempt to choose a reasonable

551

# default. This should be in the form of the API service name,

552

# e.g. "compute.googleapis.com".

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

553

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

554

"stepsLocation": "A String", # The GCS location where the steps are stored.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

555

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

556

#

557

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

558

{ # Defines a particular step within a Cloud Dataflow job.

559

#

560

# A job consists of multiple steps, each of which performs some

561

# specific operation as part of the overall job. Data is typically

562

# passed from one step to another as part of the job.

563

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

564

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

565

# Map-Reduce job:

566

#

567

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

568

# collection's elements.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

569

#

570

# * Validate the elements.

571

#

572

# * Apply a user-defined function to map each element to some value

573

# and extract an element-specific key value.

574

#

575

# * Group elements with the same key into a single element with

576

# that key, transforming a multiply-keyed collection into a

577

# uniquely-keyed collection.

578

#

579

# * Write the elements out to some data sink.

580

#

581

# Note that the Cloud Dataflow service may be used to run many different

582

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

583

"kind": "A String", # The kind of step in the Cloud Dataflow job.

584

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

585

# predefined step has its own required set of properties.

586

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

587

"a_key": "", # Properties of the object.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

588

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

589

"name": "A String", # The name that identifies the step. This must be unique for each

590

# step with respect to all other steps in the Cloud Dataflow job.

591

},

592

],

593

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

594

# callers cannot mutate it.

595

{ # A message describing the state of a particular execution stage.

596

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

597

"executionStageName": "A String", # The name of the execution stage.

598

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

599

},

600

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

601

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

602

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

603

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

604

# by the metadata values provided here. Populated for ListJobs and all GetJob

605

# views SUMMARY and higher.

606

# ListJob response and Job SUMMARY view.

607

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

608

"sdkSupportStatus": "A String", # The support status for this SDK version.

609

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

610

"version": "A String", # The version of the SDK used to run the job.

611

},

612

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

613

{ # Metadata for a BigTable connector used by the job.

614

"instanceId": "A String", # InstanceId accessed in the connection.

615

"tableId": "A String", # TableId accessed in the connection.

616

"projectId": "A String", # ProjectId accessed in the connection.

617

},

618

],

619

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

620

{ # Metadata for a PubSub connector used by the job.

621

"subscription": "A String", # Subscription used in the connection.

622

"topic": "A String", # Topic accessed in the connection.

623

},

624

],

625

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

626

{ # Metadata for a BigQuery connector used by the job.

627

"dataset": "A String", # Dataset accessed in the connection.

628

"projectId": "A String", # Project accessed in the connection.

629

"query": "A String", # Query used to access data in the connection.

630

"table": "A String", # Table accessed in the connection.

631

},

632

],

633

"fileDetails": [ # Identification of a File source used in the Dataflow job.

634

{ # Metadata for a File connector used by the job.

635

"filePattern": "A String", # File Pattern used to access files by the connector.

636

},

637

],

638

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

639

{ # Metadata for a Datastore connector used by the job.

640

"namespace": "A String", # Namespace used in the connection.

641

"projectId": "A String", # ProjectId accessed in the connection.

642

},

643

],

644

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

645

{ # Metadata for a Spanner connector used by the job.

646

"instanceId": "A String", # InstanceId accessed in the connection.

647

"databaseId": "A String", # DatabaseId accessed in the connection.

648

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

653

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

654

# contains this job.

655

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

656

# corresponding name prefixes of the new job.

657

"a_key": "A String",

658

},

659

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

660

# Flexible resource scheduling jobs are started with some delay after job

661

# creation, so start_time is unset before start and is updated when the

662

# job is started by the Cloud Dataflow service. For other jobs, start_time

663

# always equals to create_time and is immutable and set by the Cloud Dataflow

664

# service.

665

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

666

# If this field is set, the service will ensure its uniqueness.

667

# The request to create a job will fail if the service has knowledge of a

668

# previously submitted job with the same client's ID and job name.

669

# The caller may use this field to ensure idempotence of job

670

# creation across retried attempts to create a job.

671

# By default, the field is empty and, in that case, the service ignores it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

672

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

673

# isn't contained in the submitted job.

674

"stages": { # A mapping from each stage to the information about that stage.

675

"a_key": { # Contains information about how a particular

676

# google.dataflow.v1beta3.Step will be executed.

677

"stepName": [ # The steps associated with the execution stage.

678

# Note that stages may have several steps, and that a given step

679

# might be run by more than one stage.

"A String",

],

},

},

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

685

"type": "A String", # The type of Cloud Dataflow job.

686

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

687

# Cloud Dataflow service.

688

"tempFiles": [ # A set of files the system should be aware of that are used

689

# for temporary storage. These temporary files will be

690

# removed on job completion.

691

# No duplicates are allowed.

692

# No file patterns are supported.

693

#

694

# The supported files are:

695

#

696

# Google Cloud Storage:

697

#

698

# storage.googleapis.com/{bucket}/{object}

699

# bucket.storage.googleapis.com/{object}

700

"A String",

701

],

702

"id": "A String", # The unique ID of this job.

703

#

704

# This field is set by the Cloud Dataflow service when the Job is

705

# created, and is immutable for the life of the job.

706

"requestedState": "A String", # The job's requested state.

707

#

708

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

709

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

710

# also be used to directly set a job's requested state to

711

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

712

# job if it has not already reached a terminal state.

713

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

714

# of the job it replaced.

715

#

716

# When sending a `CreateJobRequest`, you can update a job by specifying it

717

# here. The job named here is stopped, and its intermediate state is

718

# transferred to this job.

719

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

720

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

721

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

722

#

723

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

724

# specified.

725

#

726

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

727

# terminal state. After a job has reached a terminal state, no

728

# further state updates may be made.

729

#

730

# This field may be mutated by the Cloud Dataflow service;

731

# callers cannot mutate it.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

732

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

733

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

734

# Only one Job with a given name may exist in a project at any

735

# given time. If a caller attempts to create a Job with the same

736

# name as an already-existing Job, the attempt returns the

737

# existing Job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

738

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

739

# The name must match the regular expression

740

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

741

"currentStateTime": "A String", # The timestamp associated with the current state.

}</pre>

</div>

<code class="details" id="get">get(projectId, location, gcsPath=None, view=None, x__xgafv=None)</code>

747

<pre>Get the template associated with a template.

748

749

Args:

750

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

751

location: string, The [regional endpoint]

752

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

753

which to direct the request. (required)

754

gcsPath: string, Required. A Cloud Storage path to the template from which to

755

create the job.

756

Must be valid Cloud Storage URL, beginning with 'gs://'.

757

view: string, The view to retrieve. Defaults to METADATA_ONLY.

758

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

765

766

{ # The response to a GetTemplate request.

767

"runtimeMetadata": { # RuntimeMetadata describing a runtime environment. # Describes the runtime metadata with SDKInfo and available parameters.

768

"parameters": [ # The parameters for the template.

769

{ # Metadata for a specific parameter.

770

"label": "A String", # Required. The label to display for the parameter.

771

"helpText": "A String", # Required. The help text to display for the parameter.

772

"regexes": [ # Optional. Regexes that the parameter must match.

773

"A String",

774

],

775

"paramType": "A String", # Optional. The type of the parameter.

776

# Used for selecting input picker.

777

"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.

778

"name": "A String", # Required. The name of the parameter.

779

},

780

],

781

"sdkInfo": { # SDK Information. # SDK Info for the template.

782

"language": "A String", # Required. The SDK Language.

783

"version": "A String", # Optional. The SDK version.

784

},

785

},

786

"status": { # The `Status` type defines a logical error model that is suitable for # The status of the get template request. Any problems with the

787

# request will be indicated in the error_details.

788

# different programming environments, including REST APIs and RPC APIs. It is

789

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

790

# three pieces of data: error code, error message, and error details.

791

#

792

# You can find out more about this error model and how to work with it in the

793

# [API Design Guide](https://cloud.google.com/apis/design/errors).

794

"details": [ # A list of messages that carry the error details. There is a common set of

795

# message types for APIs to use.

796

{

797

"a_key": "", # Properties of the object. Contains field @type with type URL.

798

},

799

],

800

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

801

"message": "A String", # A developer-facing error message, which should be in English. Any

802

# user-facing error message should be localized and sent in the

803

# google.rpc.Status.details field, or localized by the client.

804

},

805

"metadata": { # Metadata describing a template. # The template metadata describing the template name, available

806

# parameters, etc.

807

"description": "A String", # Optional. A description of the template.

808

"parameters": [ # The parameters for the template.

809

{ # Metadata for a specific parameter.

810

"label": "A String", # Required. The label to display for the parameter.

811

"helpText": "A String", # Required. The help text to display for the parameter.

812

"regexes": [ # Optional. Regexes that the parameter must match.

813

"A String",

814

],

815

"paramType": "A String", # Optional. The type of the parameter.

816

# Used for selecting input picker.

817

"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.

818

"name": "A String", # Required. The name of the parameter.

819

},

820

],

821

"name": "A String", # Required. The name of the template.

822

},

823

"templateType": "A String", # Template Type.

}</pre>

</div>

<code class="details" id="launch">launch(projectId, location, body=None, dynamicTemplate_gcsPath=None, dynamicTemplate_stagingLocation=None, validateOnly=None, gcsPath=None, x__xgafv=None)</code>

829

<pre>Launch a template.

830

831

Args:

832

projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)

833

location: string, The [regional endpoint]

834

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to

835

which to direct the request. (required)

836

body: object, The request body.

837

The object takes the form of:

838

839

{ # Parameters to provide to the template being launched.

840

"environment": { # The environment values to set at runtime. # The runtime environment for the job.

841

"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.

842

# Use with caution.

843

"tempLocation": "A String", # The Cloud Storage path to use for temporary files.

844

# Must be a valid Cloud Storage URL, beginning with `gs://`.

845

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

846

# the service will use the network "default".

847

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

848

# the form "regions/REGION/subnetworks/SUBNETWORK".

849

"workerRegion": "A String", # The Compute Engine region

850

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

851

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

852

# with worker_zone. If neither worker_region nor worker_zone is specified,

853

# default to the control plane's region.

854

"numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job.

855

"additionalExperiments": [ # Additional experiment flags for the job.

856

"A String",

857

],

858

"zone": "A String", # The Compute Engine [availability

859

# zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)

860

# for launching worker instances to run your pipeline.

861

# In the future, worker_zone will take precedence.

862

"serviceAccountEmail": "A String", # The email address of the service account to run the job as.

863

"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made

864

# available to your pipeline during execution, from 1 to 1000.

865

"workerZone": "A String", # The Compute Engine zone

866

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

867

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

868

# with worker_region. If neither worker_region nor worker_zone is specified,

869

# a zone in the control plane's region is chosen based on available capacity.

870

# If both `worker_zone` and `zone` are set, `worker_zone` takes precedence.

871

"additionalUserLabels": { # Additional user labels to be specified for the job.

872

# Keys and values should follow the restrictions specified in the [labeling

873

# restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)

874

# page.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

875

"a_key": "A String",

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

876

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

877

"machineType": "A String", # The machine type to use for the job. Defaults to the value from the

878

# template if not specified.

879

"ipConfiguration": "A String", # Configuration for VM IPs.

880

"kmsKeyName": "A String", # Optional. Name for the Cloud KMS key for the job.

881

# Key format is:

882

# projects/<project>/locations/<location>/keyRings/<keyring>/cryptoKeys/<key>

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

883

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

884

"transformNameMapping": { # Only applicable when updating a pipeline. Map of transform name prefixes of

885

# the job to be replaced to the corresponding name prefixes of the new job.

886

"a_key": "A String",

887

},

888

"update": True or False, # If set, replace the existing pipeline with the name specified by jobName

889

# with this pipeline, preserving state.

890

"jobName": "A String", # Required. The job name to use for the created job.

891

"parameters": { # The runtime parameters to pass to the job.

"a_key": "A String",

},

}

dynamicTemplate_gcsPath: string, Path to dynamic template spec file on GCS.

897

The file must be a Json serialized DynamicTemplateFieSpec object.

898

dynamicTemplate_stagingLocation: string, Cloud Storage path for staging dependencies.

899

Must be a valid Cloud Storage URL, beginning with `gs://`.

900

validateOnly: boolean, If true, the request is validated but not actually executed.

901

Defaults to false.

902

gcsPath: string, A Cloud Storage path to the template from which to create

903

the job.

904

Must be valid Cloud Storage URL, beginning with 'gs://'.

905

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

912

913

{ # Response to the request to launch a template.

914

"job": { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and

915

# the job was successfully launched.

916

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

917

# A description of the user pipeline and stages through which it is executed.

918

# Created by Cloud Dataflow service. Only retrieved with

919

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

920

# form. This data is provided by the Dataflow service for ease of visualizing

921

# the pipeline and interpreting Dataflow provided metrics.

922

"displayData": [ # Pipeline level display data.

923

{ # Data provided with a pipeline or transform to provide descriptive info.

924

"url": "A String", # An optional full URL.

925

"javaClassValue": "A String", # Contains value if the data is of java class type.

926

"timestampValue": "A String", # Contains value if the data is of timestamp type.

927

"durationValue": "A String", # Contains value if the data is of duration type.

928

"label": "A String", # An optional label to display in a dax UI for the element.

929

"key": "A String", # The key identifying the display data.

930

# This is intended to be used as a label for the display data

931

# when viewed in a dax monitoring system.

932

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

933

# language namespace (i.e. python module) which defines the display data.

934

# This allows a dax monitoring system to specially handle the data

935

# and perform custom rendering.

936

"floatValue": 3.14, # Contains value if the data is of float type.

937

"strValue": "A String", # Contains value if the data is of string type.

938

"int64Value": "A String", # Contains value if the data is of int64 type.

939

"boolValue": True or False, # Contains value if the data is of a boolean type.

940

"shortStrValue": "A String", # A possible additional shorter value to display.

941

# For example a java_class_name_value of com.mypackage.MyDoFn

942

# will be stored with MyDoFn as the short_str_value and

943

# com.mypackage.MyDoFn as the java_class_name value.

944

# short_str_value can be displayed and java_class_name_value

945

# will be displayed as a tooltip.

946

},

947

],

948

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

949

{ # Description of the type, names/ids, and input/outputs for a transform.

950

"outputCollectionName": [ # User names for all collection outputs to this transform.

951

"A String",

952

],

953

"displayData": [ # Transform-specific display data.

954

{ # Data provided with a pipeline or transform to provide descriptive info.

955

"url": "A String", # An optional full URL.

956

"javaClassValue": "A String", # Contains value if the data is of java class type.

957

"timestampValue": "A String", # Contains value if the data is of timestamp type.

958

"durationValue": "A String", # Contains value if the data is of duration type.

959

"label": "A String", # An optional label to display in a dax UI for the element.

960

"key": "A String", # The key identifying the display data.

961

# This is intended to be used as a label for the display data

962

# when viewed in a dax monitoring system.

963

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

964

# language namespace (i.e. python module) which defines the display data.

965

# This allows a dax monitoring system to specially handle the data

966

# and perform custom rendering.

967

"floatValue": 3.14, # Contains value if the data is of float type.

968

"strValue": "A String", # Contains value if the data is of string type.

969

"int64Value": "A String", # Contains value if the data is of int64 type.

970

"boolValue": True or False, # Contains value if the data is of a boolean type.

971

"shortStrValue": "A String", # A possible additional shorter value to display.

972

# For example a java_class_name_value of com.mypackage.MyDoFn

973

# will be stored with MyDoFn as the short_str_value and

974

# com.mypackage.MyDoFn as the java_class_name value.

975

# short_str_value can be displayed and java_class_name_value

976

# will be displayed as a tooltip.

977

},

978

],

979

"id": "A String", # SDK generated id of this transform instance.

980

"inputCollectionName": [ # User names for all collection inputs to this transform.

981

"A String",

982

],

983

"name": "A String", # User provided name for this transform instance.

984

"kind": "A String", # Type of transform.

985

},

986

],

987

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

988

{ # Description of the composing transforms, names/ids, and input/outputs of a

989

# stage of execution. Some composing transforms and sources may have been

990

# generated by the Dataflow service during execution planning.

991

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

992

{ # Description of an interstitial value between transforms in an execution

993

# stage.

994

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

995

"name": "A String", # Dataflow service generated name for this source.

996

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

997

# source is most closely associated.

998

},

999

],

1000

"inputSource": [ # Input sources for this stage.

1001

{ # Description of an input or output of an execution stage.

1002

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1003

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1004

# source is most closely associated.

1005

"sizeBytes": "A String", # Size of the source, if measurable.

1006

"name": "A String", # Dataflow service generated name for this source.

1007

},

1008

],

1009

"name": "A String", # Dataflow service generated name for this stage.

1010

"componentTransform": [ # Transforms that comprise this execution stage.

1011

{ # Description of a transform executed as part of an execution stage.

1012

"name": "A String", # Dataflow service generated name for this source.

1013

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1014

"originalTransform": "A String", # User name for the original user transform with which this transform is

1015

# most closely associated.

1016

},

1017

],

1018

"id": "A String", # Dataflow service generated id for this stage.

1019

"outputSource": [ # Output sources for this stage.

1020

{ # Description of an input or output of an execution stage.

1021

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1022

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1023

# source is most closely associated.

1024

"sizeBytes": "A String", # Size of the source, if measurable.

1025

"name": "A String", # Dataflow service generated name for this source.

1026

},

1027

],

1028

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

1033

#

1034

# The labels map can contain no more than 64 entries. Entries of the labels

1035

# map are UTF8 strings that comply with the following restrictions:

1036

#

1037

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1038

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1039

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1044

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

1045

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1046

"workerRegion": "A String", # The Compute Engine region

1047

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1048

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1049

# with worker_zone. If neither worker_region nor worker_zone is specified,

1050

# default to the control plane's region.

1051

"userAgent": { # A description of the process that generated the request.

1052

"a_key": "", # Properties of the object.

1053

},

1054

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

1055

"version": { # A structure describing which components and their versions of the service

1056

# are required in order to run the job.

1057

"a_key": "", # Properties of the object.

1058

},

1059

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1060

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1061

#

1062

# Format:

1063

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1064

"experiments": [ # The list of experiments to enable.

1065

"A String",

1066

],

1067

"workerZone": "A String", # The Compute Engine zone

1068

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1069

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1070

# with worker_region. If neither worker_region nor worker_zone is specified,

1071

# a zone in the control plane's region is chosen based on available capacity.

1072

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1073

# specified in order for the job to have workers.

1074

{ # Describes one particular pool of Cloud Dataflow workers to be

1075

# instantiated by the Cloud Dataflow service in order to perform the

1076

# computations required by a job. Note that a workflow job may use

1077

# multiple pools, in order to match the various computational

1078

# requirements of the various stages of the job.

1079

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1080

# Compute Engine API.

1081

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1082

# only be set in the Fn API path. For non-cross-language pipelines this

1083

# should have only one entry. Cross-language pipelines will have two or more

1084

# entries.

1085

{ # Defines a SDK harness container for executing Dataflow pipelines.

1086

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1087

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1088

# container instance with this image. If false (or unset) recommends using

1089

# more than one core per SDK container instance with this image for

1090

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1095

# will attempt to choose a reasonable default.

1096

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1097

# are supported.

1098

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1099

"a_key": "A String",

1100

},

1101

"diskSourceImage": "A String", # Fully qualified source image for disks.

1102

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1103

{ # Describes the data disk used by a workflow job.

1104

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1105

# attempt to choose a reasonable default.

1106

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1107

# must be a disk type appropriate to the project and zone in which

1108

# the workers will run. If unknown or unspecified, the service

1109

# will attempt to choose a reasonable default.

1110

#

1111

# For example, the standard persistent disk type is a resource name

1112

# typically ending in "pd-standard". If SSD persistent disks are

1113

# available, the resource name typically ends with "pd-ssd". The

1114

# actual valid values are defined the Google Compute Engine API,

1115

# not by the Cloud Dataflow API; consult the Google Compute Engine

1116

# documentation for more information about determining the set of

1117

# available disk types for a particular project and zone.

1118

#

1119

# Google Compute Engine Disk types are local to a particular

1120

# project in a particular zone, and so the resource name will

1121

# typically look something like this:

1122

#

1123

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1124

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1125

},

1126

],

1127

"packages": [ # Packages to be installed on workers.

1128

{ # The packages that must be installed in order for a worker to run the

1129

# steps of the Cloud Dataflow job that will be assigned to its worker

1130

# pool.

1131

#

1132

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1133

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1134

# might use this to install jars containing the user's code and all of the

1135

# various dependencies (libraries, data files, etc.) required in order

1136

# for that code to run.

1137

"name": "A String", # The name of the package.

1138

"location": "A String", # The resource to read the package from. The supported resource type is:

1139

#

1140

# Google Cloud Storage:

1141

#

1142

# storage.googleapis.com/{bucket}

1143

# bucket.storage.googleapis.com/

1144

},

1145

],

1146

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1147

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1148

# `TEARDOWN_NEVER`.

1149

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1150

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1151

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1152

# down.

1153

#

1154

# If the workers are not torn down by the service, they will

1155

# continue to run and use Google Compute Engine VM resources in the

1156

# user's project until they are explicitly terminated by the user.

1157

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1158

# policy except for small, manually supervised test jobs.

1159

#

1160

# If unknown or unspecified, the service will attempt to choose a reasonable

1161

# default.

1162

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1163

# the service will use the network "default".

1164

"ipConfiguration": "A String", # Configuration for VM IPs.

1165

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1166

# attempt to choose a reasonable default.

1167

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1168

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1169

"algorithm": "A String", # The algorithm to use for autoscaling.

1170

},

1171

"poolArgs": { # Extra arguments for this worker pool.

1172

"a_key": "", # Properties of the object. Contains field @type with type URL.

1173

},

1174

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1175

# the form "regions/REGION/subnetworks/SUBNETWORK".

1176

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1177

# execute the job. If zero or unspecified, the service will

1178

# attempt to choose a reasonable default.

1179

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1180

# service will choose a number of threads (according to the number of cores

1181

# on the selected machine type for batch, or 1 by convention for streaming).

1182

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1183

# harness, residing in Google Container Registry.

1184

#

1185

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1186

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1187

# using the standard Dataflow task runner. Users should ignore

1188

# this field.

1189

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1190

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1191

# access the Cloud Dataflow API.

1192

"A String",

1193

],

1194

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1195

#

1196

# When workers access Google Cloud APIs, they logically do so via

1197

# relative URLs. If this field is specified, it supplies the base

1198

# URL to use for resolving these relative URLs. The normative

1199

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1200

# Locators".

1201

#

1202

# If not specified, the default value is "http://www.googleapis.com/"

1203

"workflowFileName": "A String", # The file to store the workflow in.

1204

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1205

# console.

1206

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1207

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1208

# taskrunner; e.g. "root".

1209

"vmId": "A String", # The ID string of the VM.

1210

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1211

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1212

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1213

# "shuffle/v1beta1".

1214

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1215

# storage.

1216

#

1217

# The supported resource type is:

1218

#

1219

# Google Cloud Storage:

1220

#

1221

# storage.googleapis.com/{bucket}/{object}

1222

# bucket.storage.googleapis.com/{object}

1223

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1224

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1225

# "dataflow/v1b3/projects".

1226

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1227

#

1228

# When workers access Google Cloud APIs, they logically do so via

1229

# relative URLs. If this field is specified, it supplies the base

1230

# URL to use for resolving these relative URLs. The normative

1231

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1232

# Locators".

1233

#

1234

# If not specified, the default value is "http://www.googleapis.com/"

1235

"workerId": "A String", # The ID of the worker running this pipeline.

1236

},

1237

"harnessCommand": "A String", # The command to launch the worker harness.

1238

"logDir": "A String", # The directory on the VM to store logs.

1239

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1240

"languageHint": "A String", # The suggested backend language.

1241

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1242

# taskrunner; e.g. "wheel".

1243

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1244

# will not be uploaded.

1245

#

1246

# The supported resource type is:

1247

#

1248

# Google Cloud Storage:

1249

# storage.googleapis.com/{bucket}/{object}

1250

# bucket.storage.googleapis.com/{object}

1251

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1252

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1253

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1254

# temporary storage.

1255

#

1256

# The supported resource type is:

1257

#

1258

# Google Cloud Storage:

1259

# storage.googleapis.com/{bucket}/{object}

1260

# bucket.storage.googleapis.com/{object}

1261

},

1262

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1263

# attempt to choose a reasonable default.

1264

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1265

# select a default set of packages which are useful to worker

1266

# harnesses written in a particular language.

1267

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1268

# service will attempt to choose a reasonable default.

1269

},

1270

],

1271

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1272

# storage. The system will append the suffix "/temp-{JOBNAME} to

1273

# this resource prefix, where {JOBNAME} is the value of the

1274

# job_name field. The resulting bucket and object prefix is used

1275

# as the prefix of the resources used to store temporary data

1276

# needed during the job execution. NOTE: This will override the

1277

# value in taskrunner_settings.

1278

# The supported resource type is:

1279

#

1280

# Google Cloud Storage:

1281

#

1282

# storage.googleapis.com/{bucket}/{object}

1283

# bucket.storage.googleapis.com/{object}

1284

"internalExperiments": { # Experimental settings.

1285

"a_key": "", # Properties of the object. Contains field @type with type URL.

1286

},

1287

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1288

# options are passed through the service and are used to recreate the

1289

# SDK pipeline options on the worker in a language agnostic and platform

1290

# independent way.

1291

"a_key": "", # Properties of the object.

1292

},

1293

"dataset": "A String", # The dataset for the current project where various workflow

1294

# related tables are stored.

1295

#

1296

# The supported resource type is:

1297

#

1298

# Google BigQuery:

1299

# bigquery.googleapis.com/{dataset}

1300

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1301

# unspecified, the service will attempt to choose a reasonable

1302

# default. This should be in the form of the API service name,

1303

# e.g. "compute.googleapis.com".

1304

},

1305

"stepsLocation": "A String", # The GCS location where the steps are stored.

1306

"steps": [ # Exactly one of step or steps_location should be specified.

1307

#

1308

# The top-level steps that constitute the entire job.

1309

{ # Defines a particular step within a Cloud Dataflow job.

1310

#

1311

# A job consists of multiple steps, each of which performs some

1312

# specific operation as part of the overall job. Data is typically

1313

# passed from one step to another as part of the job.

1314

#

1315

# Here's an example of a sequence of steps which together implement a

1316

# Map-Reduce job:

1317

#

1318

# * Read a collection of data from some source, parsing the

1319

# collection's elements.

1320

#

1321

# * Validate the elements.

1322

#

1323

# * Apply a user-defined function to map each element to some value

1324

# and extract an element-specific key value.

1325

#

1326

# * Group elements with the same key into a single element with

1327

# that key, transforming a multiply-keyed collection into a

1328

# uniquely-keyed collection.

1329

#

1330

# * Write the elements out to some data sink.

1331

#

1332

# Note that the Cloud Dataflow service may be used to run many different

1333

# types of jobs, not just Map-Reduce.

1334

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1335

"properties": { # Named properties associated with the step. Each kind of

1336

# predefined step has its own required set of properties.

1337

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1338

"a_key": "", # Properties of the object.

1339

},

1340

"name": "A String", # The name that identifies the step. This must be unique for each

1341

# step with respect to all other steps in the Cloud Dataflow job.

1342

},

1343

],

1344

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1345

# callers cannot mutate it.

1346

{ # A message describing the state of a particular execution stage.

1347

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1348

"executionStageName": "A String", # The name of the execution stage.

1349

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1350

},

1351

],

1352

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1353

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1354

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1355

# by the metadata values provided here. Populated for ListJobs and all GetJob

1356

# views SUMMARY and higher.

1357

# ListJob response and Job SUMMARY view.

1358

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1359

"sdkSupportStatus": "A String", # The support status for this SDK version.

1360

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1361

"version": "A String", # The version of the SDK used to run the job.

1362

},

1363

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1364

{ # Metadata for a BigTable connector used by the job.

1365

"instanceId": "A String", # InstanceId accessed in the connection.

1366

"tableId": "A String", # TableId accessed in the connection.

1367

"projectId": "A String", # ProjectId accessed in the connection.

1368

},

1369

],

1370

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1371

{ # Metadata for a PubSub connector used by the job.

1372

"subscription": "A String", # Subscription used in the connection.

1373

"topic": "A String", # Topic accessed in the connection.

1374

},

1375

],

1376

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1377

{ # Metadata for a BigQuery connector used by the job.

1378

"dataset": "A String", # Dataset accessed in the connection.

1379

"projectId": "A String", # Project accessed in the connection.

1380

"query": "A String", # Query used to access data in the connection.

1381

"table": "A String", # Table accessed in the connection.

1382

},

1383

],

1384

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1385

{ # Metadata for a File connector used by the job.

1386

"filePattern": "A String", # File Pattern used to access files by the connector.

1387

},

1388

],

1389

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1390

{ # Metadata for a Datastore connector used by the job.

1391

"namespace": "A String", # Namespace used in the connection.

1392

"projectId": "A String", # ProjectId accessed in the connection.

1393

},

1394

],

1395

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1396

{ # Metadata for a Spanner connector used by the job.

1397

"instanceId": "A String", # InstanceId accessed in the connection.

1398

"databaseId": "A String", # DatabaseId accessed in the connection.

1399

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

1404

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1405

# contains this job.

1406

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1407

# corresponding name prefixes of the new job.

1408

"a_key": "A String",

1409

},

1410

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1411

# Flexible resource scheduling jobs are started with some delay after job

1412

# creation, so start_time is unset before start and is updated when the

1413

# job is started by the Cloud Dataflow service. For other jobs, start_time

1414

# always equals to create_time and is immutable and set by the Cloud Dataflow

1415

# service.

1416

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1417

# If this field is set, the service will ensure its uniqueness.

1418

# The request to create a job will fail if the service has knowledge of a

1419

# previously submitted job with the same client's ID and job name.

1420

# The caller may use this field to ensure idempotence of job

1421

# creation across retried attempts to create a job.

1422

# By default, the field is empty and, in that case, the service ignores it.

1423

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1424

# isn't contained in the submitted job.

1425

"stages": { # A mapping from each stage to the information about that stage.

1426

"a_key": { # Contains information about how a particular

1427

# google.dataflow.v1beta3.Step will be executed.

1428

"stepName": [ # The steps associated with the execution stage.

1429

# Note that stages may have several steps, and that a given step

1430

# might be run by more than one stage.

"A String",

],

},

},

},

"type": "A String", # The type of Cloud Dataflow job.

1437

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1438

# Cloud Dataflow service.

1439

"tempFiles": [ # A set of files the system should be aware of that are used

1440

# for temporary storage. These temporary files will be

1441

# removed on job completion.

1442

# No duplicates are allowed.

1443

# No file patterns are supported.

1444

#

1445

# The supported files are:

1446

#

1447

# Google Cloud Storage:

1448

#

1449

# storage.googleapis.com/{bucket}/{object}

1450

# bucket.storage.googleapis.com/{object}

1451

"A String",

1452

],

1453

"id": "A String", # The unique ID of this job.

1454

#

1455

# This field is set by the Cloud Dataflow service when the Job is

1456

# created, and is immutable for the life of the job.

1457

"requestedState": "A String", # The job's requested state.

1458

#

1459

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1460

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1461

# also be used to directly set a job's requested state to

1462

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1463

# job if it has not already reached a terminal state.

1464

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1465

# of the job it replaced.

1466

#

1467

# When sending a `CreateJobRequest`, you can update a job by specifying it

1468

# here. The job named here is stopped, and its intermediate state is

1469

# transferred to this job.

1470

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1471

# snapshot.

1472

"currentState": "A String", # The current state of the job.

1473

#

1474

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1475

# specified.

1476

#

1477

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1478

# terminal state. After a job has reached a terminal state, no

1479

# further state updates may be made.

1480

#

1481

# This field may be mutated by the Cloud Dataflow service;

1482

# callers cannot mutate it.

1483

"name": "A String", # The user-specified Cloud Dataflow job name.

1484

#

1485

# Only one Job with a given name may exist in a project at any

1486

# given time. If a caller attempts to create a Job with the same

1487

# name as an already-existing Job, the attempt returns the

1488

# existing Job.

1489

#

1490

# The name must match the regular expression

1491

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1492

"currentStateTime": "A String", # The timestamp associated with the current state.

1493

},

Sai Cheemalapati