Blame - docs/dyn/dataflow_v1b3.projects.locations.jobs.html - platform/external/python/google-api-python-client

2017-01-06 09:58:29 -0800

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Sai Cheemalapati

4ba8c23

2017-06-06 18:46:08 -0400

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.locations.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.locations.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

88

<code><a href="dataflow_v1b3.projects.locations.jobs.snapshots.html">snapshots()</a></code>

89

</p>

90

<p class="firstline">Returns the snapshots Resource.</p>

91

92

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

93

<code><a href="dataflow_v1b3.projects.locations.jobs.workItems.html">workItems()</a></code>

94

</p>

95

<p class="firstline">Returns the workItems Resource.</p>

96

97

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

98

<code><a href="#create">create(projectId, location, body=None, view=None, replaceJobId=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

99

<p class="firstline">Creates a Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

100

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

101

<code><a href="#get">get(projectId, location, jobId, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

102

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

103

104

<code><a href="#getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</a></code></p>

105

<p class="firstline">Request the job status.</p>

106

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

107

<code><a href="#list">list(projectId, location, pageToken=None, view=None, pageSize=None, filter=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

108

<p class="firstline">List the jobs of a project.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

109

110

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

111

<p class="firstline">Retrieves the next page of results.</p>

112

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

113

<code><a href="#snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

114

<p class="firstline">Snapshot the state of a streaming job.</p>

115

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

116

<code><a href="#update">update(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

117

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

118

<h3>Method Details</h3>

119

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

120

<code class="details" id="create">create(projectId, location, body=None, view=None, replaceJobId=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

121

<pre>Creates a Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

122

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

123

To create a job, we recommend using `projects.locations.jobs.create` with a

124

[regional endpoint]

125

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

126

`projects.jobs.create` is not recommended, as your job will always start

127

in `us-central1`.

128

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

129

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

130

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

131

location: string, The [regional endpoint]

132

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

133

contains this job. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

134

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

135

The object takes the form of:

136

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

137

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

138

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

139

# A description of the user pipeline and stages through which it is executed.

140

# Created by Cloud Dataflow service. Only retrieved with

141

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

142

# form. This data is provided by the Dataflow service for ease of visualizing

143

# the pipeline and interpreting Dataflow provided metrics.

144

"displayData": [ # Pipeline level display data.

145

{ # Data provided with a pipeline or transform to provide descriptive info.

146

"url": "A String", # An optional full URL.

147

"javaClassValue": "A String", # Contains value if the data is of java class type.

148

"timestampValue": "A String", # Contains value if the data is of timestamp type.

149

"durationValue": "A String", # Contains value if the data is of duration type.

150

"label": "A String", # An optional label to display in a dax UI for the element.

151

"key": "A String", # The key identifying the display data.

152

# This is intended to be used as a label for the display data

153

# when viewed in a dax monitoring system.

154

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

155

# language namespace (i.e. python module) which defines the display data.

156

# This allows a dax monitoring system to specially handle the data

157

# and perform custom rendering.

158

"floatValue": 3.14, # Contains value if the data is of float type.

159

"strValue": "A String", # Contains value if the data is of string type.

160

"int64Value": "A String", # Contains value if the data is of int64 type.

161

"boolValue": True or False, # Contains value if the data is of a boolean type.

162

"shortStrValue": "A String", # A possible additional shorter value to display.

163

# For example a java_class_name_value of com.mypackage.MyDoFn

164

# will be stored with MyDoFn as the short_str_value and

165

# com.mypackage.MyDoFn as the java_class_name value.

166

# short_str_value can be displayed and java_class_name_value

167

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

168

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

169

],

170

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

171

{ # Description of the type, names/ids, and input/outputs for a transform.

172

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

173

"A String",

174

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

175

"displayData": [ # Transform-specific display data.

176

{ # Data provided with a pipeline or transform to provide descriptive info.

177

"url": "A String", # An optional full URL.

178

"javaClassValue": "A String", # Contains value if the data is of java class type.

179

"timestampValue": "A String", # Contains value if the data is of timestamp type.

180

"durationValue": "A String", # Contains value if the data is of duration type.

181

"label": "A String", # An optional label to display in a dax UI for the element.

182

"key": "A String", # The key identifying the display data.

183

# This is intended to be used as a label for the display data

184

# when viewed in a dax monitoring system.

185

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

186

# language namespace (i.e. python module) which defines the display data.

187

# This allows a dax monitoring system to specially handle the data

188

# and perform custom rendering.

189

"floatValue": 3.14, # Contains value if the data is of float type.

190

"strValue": "A String", # Contains value if the data is of string type.

191

"int64Value": "A String", # Contains value if the data is of int64 type.

192

"boolValue": True or False, # Contains value if the data is of a boolean type.

193

"shortStrValue": "A String", # A possible additional shorter value to display.

194

# For example a java_class_name_value of com.mypackage.MyDoFn

195

# will be stored with MyDoFn as the short_str_value and

196

# com.mypackage.MyDoFn as the java_class_name value.

197

# short_str_value can be displayed and java_class_name_value

198

# will be displayed as a tooltip.

199

},

200

],

201

"id": "A String", # SDK generated id of this transform instance.

202

"inputCollectionName": [ # User names for all collection inputs to this transform.

203

"A String",

204

],

205

"name": "A String", # User provided name for this transform instance.

206

"kind": "A String", # Type of transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

207

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

208

],

209

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

210

{ # Description of the composing transforms, names/ids, and input/outputs of a

211

# stage of execution. Some composing transforms and sources may have been

212

# generated by the Dataflow service during execution planning.

213

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

214

{ # Description of an interstitial value between transforms in an execution

215

# stage.

216

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

217

"name": "A String", # Dataflow service generated name for this source.

218

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

219

# source is most closely associated.

220

},

221

],

222

"inputSource": [ # Input sources for this stage.

223

{ # Description of an input or output of an execution stage.

224

"userName": "A String", # Human-readable name for this source; may be user or system generated.

225

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

226

# source is most closely associated.

227

"sizeBytes": "A String", # Size of the source, if measurable.

228

"name": "A String", # Dataflow service generated name for this source.

229

},

230

],

231

"name": "A String", # Dataflow service generated name for this stage.

232

"componentTransform": [ # Transforms that comprise this execution stage.

233

{ # Description of a transform executed as part of an execution stage.

234

"name": "A String", # Dataflow service generated name for this source.

235

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

236

"originalTransform": "A String", # User name for the original user transform with which this transform is

237

# most closely associated.

238

},

239

],

240

"id": "A String", # Dataflow service generated id for this stage.

241

"outputSource": [ # Output sources for this stage.

242

{ # Description of an input or output of an execution stage.

243

"userName": "A String", # Human-readable name for this source; may be user or system generated.

244

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

245

# source is most closely associated.

246

"sizeBytes": "A String", # Size of the source, if measurable.

247

"name": "A String", # Dataflow service generated name for this source.

248

},

249

],

250

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

251

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

252

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

253

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

254

"labels": { # User-defined labels for this job.

255

#

256

# The labels map can contain no more than 64 entries. Entries of the labels

257

# map are UTF8 strings that comply with the following restrictions:

258

#

259

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

260

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

261

# * Both keys and values are additionally constrained to be <= 128 bytes in

262

# size.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

263

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

264

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

265

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

266

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

267

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

268

"workerRegion": "A String", # The Compute Engine region

269

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

270

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

271

# with worker_zone. If neither worker_region nor worker_zone is specified,

272

# default to the control plane's region.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

273

"userAgent": { # A description of the process that generated the request.

274

"a_key": "", # Properties of the object.

275

},

276

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

277

"version": { # A structure describing which components and their versions of the service

278

# are required in order to run the job.

279

"a_key": "", # Properties of the object.

280

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

281

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

282

# at rest, AKA a Customer Managed Encryption Key (CMEK).

283

#

284

# Format:

285

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

286

"experiments": [ # The list of experiments to enable.

287

"A String",

288

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

289

"workerZone": "A String", # The Compute Engine zone

290

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

291

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

292

# with worker_region. If neither worker_region nor worker_zone is specified,

293

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

294

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

295

# specified in order for the job to have workers.

296

{ # Describes one particular pool of Cloud Dataflow workers to be

297

# instantiated by the Cloud Dataflow service in order to perform the

298

# computations required by a job. Note that a workflow job may use

299

# multiple pools, in order to match the various computational

300

# requirements of the various stages of the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

301

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

302

# Compute Engine API.

303

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

304

# only be set in the Fn API path. For non-cross-language pipelines this

305

# should have only one entry. Cross-language pipelines will have two or more

306

# entries.

307

{ # Defines a SDK harness container for executing Dataflow pipelines.

308

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

309

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

310

# container instance with this image. If false (or unset) recommends using

311

# more than one core per SDK container instance with this image for

312

# efficiency. Note that Dataflow service may choose to override this property

313

# if needed.

314

},

315

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

316

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

317

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

318

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

319

# are supported.

320

"metadata": { # Metadata to set on the Google Compute Engine VMs.

321

"a_key": "A String",

322

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

323

"diskSourceImage": "A String", # Fully qualified source image for disks.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

324

"dataDisks": [ # Data disks that are used by a VM in this workflow.

325

{ # Describes the data disk used by a workflow job.

326

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

327

# attempt to choose a reasonable default.

328

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

329

# must be a disk type appropriate to the project and zone in which

330

# the workers will run. If unknown or unspecified, the service

331

# will attempt to choose a reasonable default.

332

#

333

# For example, the standard persistent disk type is a resource name

334

# typically ending in "pd-standard". If SSD persistent disks are

335

# available, the resource name typically ends with "pd-ssd". The

336

# actual valid values are defined the Google Compute Engine API,

337

# not by the Cloud Dataflow API; consult the Google Compute Engine

338

# documentation for more information about determining the set of

339

# available disk types for a particular project and zone.

340

#

341

# Google Compute Engine Disk types are local to a particular

342

# project in a particular zone, and so the resource name will

343

# typically look something like this:

344

#

345

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

346

"mountPoint": "A String", # Directory in a VM where disk is mounted.

347

},

348

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

349

"packages": [ # Packages to be installed on workers.

350

{ # The packages that must be installed in order for a worker to run the

351

# steps of the Cloud Dataflow job that will be assigned to its worker

352

# pool.

353

#

354

# This is the mechanism by which the Cloud Dataflow SDK causes code to

355

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

356

# might use this to install jars containing the user's code and all of the

357

# various dependencies (libraries, data files, etc.) required in order

358

# for that code to run.

359

"name": "A String", # The name of the package.

360

"location": "A String", # The resource to read the package from. The supported resource type is:

361

#

362

# Google Cloud Storage:

363

#

364

# storage.googleapis.com/{bucket}

365

# bucket.storage.googleapis.com/

366

},

367

],

368

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

369

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

370

# `TEARDOWN_NEVER`.

371

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

372

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

373

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

374

# down.

375

#

376

# If the workers are not torn down by the service, they will

377

# continue to run and use Google Compute Engine VM resources in the

378

# user's project until they are explicitly terminated by the user.

379

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

380

# policy except for small, manually supervised test jobs.

381

#

382

# If unknown or unspecified, the service will attempt to choose a reasonable

383

# default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

384

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

385

# the service will use the network "default".

386

"ipConfiguration": "A String", # Configuration for VM IPs.

387

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

388

# attempt to choose a reasonable default.

389

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

390

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

391

"algorithm": "A String", # The algorithm to use for autoscaling.

392

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

393

"poolArgs": { # Extra arguments for this worker pool.

394

"a_key": "", # Properties of the object. Contains field @type with type URL.

395

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

396

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

397

# the form "regions/REGION/subnetworks/SUBNETWORK".

398

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

399

# execute the job. If zero or unspecified, the service will

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

400

# attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

401

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

402

# service will choose a number of threads (according to the number of cores

403

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

404

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

405

# harness, residing in Google Container Registry.

406

#

407

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

408

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

409

# using the standard Dataflow task runner. Users should ignore

410

# this field.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

411

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

412

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

413

# access the Cloud Dataflow API.

414

"A String",

415

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

416

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

417

#

418

# When workers access Google Cloud APIs, they logically do so via

419

# relative URLs. If this field is specified, it supplies the base

420

# URL to use for resolving these relative URLs. The normative

421

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

422

# Locators".

423

#

424

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

425

"workflowFileName": "A String", # The file to store the workflow in.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

426

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

427

# console.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

428

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

429

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

430

# taskrunner; e.g. "root".

431

"vmId": "A String", # The ID string of the VM.

432

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

433

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

434

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

435

# "shuffle/v1beta1".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

436

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

437

# storage.

438

#

439

# The supported resource type is:

440

#

441

# Google Cloud Storage:

442

#

443

# storage.googleapis.com/{bucket}/{object}

444

# bucket.storage.googleapis.com/{object}

445

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

446

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

447

# "dataflow/v1b3/projects".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

448

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

449

#

450

# When workers access Google Cloud APIs, they logically do so via

451

# relative URLs. If this field is specified, it supplies the base

452

# URL to use for resolving these relative URLs. The normative

453

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

454

# Locators".

455

#

456

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

457

"workerId": "A String", # The ID of the worker running this pipeline.

458

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

459

"harnessCommand": "A String", # The command to launch the worker harness.

460

"logDir": "A String", # The directory on the VM to store logs.

461

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

462

"languageHint": "A String", # The suggested backend language.

463

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

464

# taskrunner; e.g. "wheel".

465

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

466

# will not be uploaded.

467

#

468

# The supported resource type is:

469

#

470

# Google Cloud Storage:

471

# storage.googleapis.com/{bucket}/{object}

472

# bucket.storage.googleapis.com/{object}

473

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

474

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

475

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

476

# temporary storage.

477

#

478

# The supported resource type is:

479

#

480

# Google Cloud Storage:

481

# storage.googleapis.com/{bucket}/{object}

482

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

483

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

484

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

485

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

486

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

487

# select a default set of packages which are useful to worker

488

# harnesses written in a particular language.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

489

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

490

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

491

},

492

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

493

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

494

# storage. The system will append the suffix "/temp-{JOBNAME} to

495

# this resource prefix, where {JOBNAME} is the value of the

496

# job_name field. The resulting bucket and object prefix is used

497

# as the prefix of the resources used to store temporary data

498

# needed during the job execution. NOTE: This will override the

499

# value in taskrunner_settings.

500

# The supported resource type is:

501

#

502

# Google Cloud Storage:

503

#

504

# storage.googleapis.com/{bucket}/{object}

505

# bucket.storage.googleapis.com/{object}

506

"internalExperiments": { # Experimental settings.

507

"a_key": "", # Properties of the object. Contains field @type with type URL.

508

},

509

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

510

# options are passed through the service and are used to recreate the

511

# SDK pipeline options on the worker in a language agnostic and platform

512

# independent way.

513

"a_key": "", # Properties of the object.

514

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

515

"dataset": "A String", # The dataset for the current project where various workflow

516

# related tables are stored.

517

#

518

# The supported resource type is:

519

#

520

# Google BigQuery:

521

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

522

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

523

# unspecified, the service will attempt to choose a reasonable

524

# default. This should be in the form of the API service name,

525

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

526

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

527

"stepsLocation": "A String", # The GCS location where the steps are stored.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

528

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

529

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

530

# The top-level steps that constitute the entire job.

531

{ # Defines a particular step within a Cloud Dataflow job.

532

#

533

# A job consists of multiple steps, each of which performs some

534

# specific operation as part of the overall job. Data is typically

535

# passed from one step to another as part of the job.

536

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

537

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

538

# Map-Reduce job:

539

#

540

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

541

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

542

#

543

# * Validate the elements.

544

#

545

# * Apply a user-defined function to map each element to some value

546

# and extract an element-specific key value.

547

#

548

# * Group elements with the same key into a single element with

549

# that key, transforming a multiply-keyed collection into a

550

# uniquely-keyed collection.

551

#

552

# * Write the elements out to some data sink.

553

#

554

# Note that the Cloud Dataflow service may be used to run many different

555

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

556

"kind": "A String", # The kind of step in the Cloud Dataflow job.

557

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

558

# predefined step has its own required set of properties.

559

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

560

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

561

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

562

"name": "A String", # The name that identifies the step. This must be unique for each

563

# step with respect to all other steps in the Cloud Dataflow job.

564

},

565

],

566

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

567

# callers cannot mutate it.

568

{ # A message describing the state of a particular execution stage.

569

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

570

"executionStageName": "A String", # The name of the execution stage.

571

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

572

},

573

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

574

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

575

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

576

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

577

# by the metadata values provided here. Populated for ListJobs and all GetJob

578

# views SUMMARY and higher.

579

# ListJob response and Job SUMMARY view.

580

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

581

"sdkSupportStatus": "A String", # The support status for this SDK version.

582

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

583

"version": "A String", # The version of the SDK used to run the job.

584

},

585

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

586

{ # Metadata for a BigTable connector used by the job.

587

"instanceId": "A String", # InstanceId accessed in the connection.

588

"tableId": "A String", # TableId accessed in the connection.

589

"projectId": "A String", # ProjectId accessed in the connection.

590

},

591

],

592

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

593

{ # Metadata for a PubSub connector used by the job.

594

"subscription": "A String", # Subscription used in the connection.

595

"topic": "A String", # Topic accessed in the connection.

596

},

597

],

598

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

599

{ # Metadata for a BigQuery connector used by the job.

600

"dataset": "A String", # Dataset accessed in the connection.

601

"projectId": "A String", # Project accessed in the connection.

602

"query": "A String", # Query used to access data in the connection.

603

"table": "A String", # Table accessed in the connection.

604

},

605

],

606

"fileDetails": [ # Identification of a File source used in the Dataflow job.

607

{ # Metadata for a File connector used by the job.

608

"filePattern": "A String", # File Pattern used to access files by the connector.

609

},

610

],

611

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

612

{ # Metadata for a Datastore connector used by the job.

613

"namespace": "A String", # Namespace used in the connection.

614

"projectId": "A String", # ProjectId accessed in the connection.

615

},

616

],

617

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

618

{ # Metadata for a Spanner connector used by the job.

619

"instanceId": "A String", # InstanceId accessed in the connection.

620

"databaseId": "A String", # DatabaseId accessed in the connection.

621

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

626

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

627

# contains this job.

628

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

629

# corresponding name prefixes of the new job.

630

"a_key": "A String",

631

},

632

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

633

# Flexible resource scheduling jobs are started with some delay after job

634

# creation, so start_time is unset before start and is updated when the

635

# job is started by the Cloud Dataflow service. For other jobs, start_time

636

# always equals to create_time and is immutable and set by the Cloud Dataflow

637

# service.

638

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

639

# If this field is set, the service will ensure its uniqueness.

640

# The request to create a job will fail if the service has knowledge of a

641

# previously submitted job with the same client's ID and job name.

642

# The caller may use this field to ensure idempotence of job

643

# creation across retried attempts to create a job.

644

# By default, the field is empty and, in that case, the service ignores it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

645

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

646

# isn't contained in the submitted job.

647

"stages": { # A mapping from each stage to the information about that stage.

648

"a_key": { # Contains information about how a particular

649

# google.dataflow.v1beta3.Step will be executed.

650

"stepName": [ # The steps associated with the execution stage.

651

# Note that stages may have several steps, and that a given step

652

# might be run by more than one stage.

"A String",

],

},

},

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

658

"type": "A String", # The type of Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

659

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

660

# Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

661

"tempFiles": [ # A set of files the system should be aware of that are used

662

# for temporary storage. These temporary files will be

663

# removed on job completion.

664

# No duplicates are allowed.

665

# No file patterns are supported.

666

#

667

# The supported files are:

668

#

669

# Google Cloud Storage:

670

#

671

# storage.googleapis.com/{bucket}/{object}

672

# bucket.storage.googleapis.com/{object}

673

"A String",

674

],

675

"id": "A String", # The unique ID of this job.

676

#

677

# This field is set by the Cloud Dataflow service when the Job is

678

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

679

"requestedState": "A String", # The job's requested state.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

680

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

681

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

682

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

683

# also be used to directly set a job's requested state to

684

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

685

# job if it has not already reached a terminal state.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

686

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

687

# of the job it replaced.

688

#

689

# When sending a `CreateJobRequest`, you can update a job by specifying it

690

# here. The job named here is stopped, and its intermediate state is

691

# transferred to this job.

692

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

693

# snapshot.

694

"currentState": "A String", # The current state of the job.

695

#

696

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

697

# specified.

698

#

699

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

700

# terminal state. After a job has reached a terminal state, no

701

# further state updates may be made.

702

#

703

# This field may be mutated by the Cloud Dataflow service;

704

# callers cannot mutate it.

705

"name": "A String", # The user-specified Cloud Dataflow job name.

706

#

707

# Only one Job with a given name may exist in a project at any

708

# given time. If a caller attempts to create a Job with the same

709

# name as an already-existing Job, the attempt returns the

710

# existing Job.

711

#

712

# The name must match the regular expression

713

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

714

"currentStateTime": "A String", # The timestamp associated with the current state.

715

}

716

717

view: string, The level of information requested in response.

718

replaceJobId: string, Deprecated. This field is now in the Job message.

719

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

726

727

{ # Defines a job to be run by the Cloud Dataflow service.

728

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

729

# A description of the user pipeline and stages through which it is executed.

730

# Created by Cloud Dataflow service. Only retrieved with

731

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

732

# form. This data is provided by the Dataflow service for ease of visualizing

733

# the pipeline and interpreting Dataflow provided metrics.

734

"displayData": [ # Pipeline level display data.

735

{ # Data provided with a pipeline or transform to provide descriptive info.

736

"url": "A String", # An optional full URL.

737

"javaClassValue": "A String", # Contains value if the data is of java class type.

738

"timestampValue": "A String", # Contains value if the data is of timestamp type.

739

"durationValue": "A String", # Contains value if the data is of duration type.

740

"label": "A String", # An optional label to display in a dax UI for the element.

741

"key": "A String", # The key identifying the display data.

742

# This is intended to be used as a label for the display data

743

# when viewed in a dax monitoring system.

744

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

745

# language namespace (i.e. python module) which defines the display data.

746

# This allows a dax monitoring system to specially handle the data

747

# and perform custom rendering.

748

"floatValue": 3.14, # Contains value if the data is of float type.

749

"strValue": "A String", # Contains value if the data is of string type.

750

"int64Value": "A String", # Contains value if the data is of int64 type.

751

"boolValue": True or False, # Contains value if the data is of a boolean type.

752

"shortStrValue": "A String", # A possible additional shorter value to display.

753

# For example a java_class_name_value of com.mypackage.MyDoFn

754

# will be stored with MyDoFn as the short_str_value and

755

# com.mypackage.MyDoFn as the java_class_name value.

756

# short_str_value can be displayed and java_class_name_value

757

# will be displayed as a tooltip.

758

},

759

],

760

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

761

{ # Description of the type, names/ids, and input/outputs for a transform.

762

"outputCollectionName": [ # User names for all collection outputs to this transform.

763

"A String",

764

],

765

"displayData": [ # Transform-specific display data.

766

{ # Data provided with a pipeline or transform to provide descriptive info.

767

"url": "A String", # An optional full URL.

768

"javaClassValue": "A String", # Contains value if the data is of java class type.

769

"timestampValue": "A String", # Contains value if the data is of timestamp type.

770

"durationValue": "A String", # Contains value if the data is of duration type.

771

"label": "A String", # An optional label to display in a dax UI for the element.

772

"key": "A String", # The key identifying the display data.

773

# This is intended to be used as a label for the display data

774

# when viewed in a dax monitoring system.

775

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

776

# language namespace (i.e. python module) which defines the display data.

777

# This allows a dax monitoring system to specially handle the data

778

# and perform custom rendering.

779

"floatValue": 3.14, # Contains value if the data is of float type.

780

"strValue": "A String", # Contains value if the data is of string type.

781

"int64Value": "A String", # Contains value if the data is of int64 type.

782

"boolValue": True or False, # Contains value if the data is of a boolean type.

783

"shortStrValue": "A String", # A possible additional shorter value to display.

784

# For example a java_class_name_value of com.mypackage.MyDoFn

785

# will be stored with MyDoFn as the short_str_value and

786

# com.mypackage.MyDoFn as the java_class_name value.

787

# short_str_value can be displayed and java_class_name_value

788

# will be displayed as a tooltip.

789

},

790

],

791

"id": "A String", # SDK generated id of this transform instance.

792

"inputCollectionName": [ # User names for all collection inputs to this transform.

793

"A String",

794

],

795

"name": "A String", # User provided name for this transform instance.

796

"kind": "A String", # Type of transform.

797

},

798

],

799

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

800

{ # Description of the composing transforms, names/ids, and input/outputs of a

801

# stage of execution. Some composing transforms and sources may have been

802

# generated by the Dataflow service during execution planning.

803

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

804

{ # Description of an interstitial value between transforms in an execution

805

# stage.

806

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

807

"name": "A String", # Dataflow service generated name for this source.

808

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

809

# source is most closely associated.

810

},

811

],

812

"inputSource": [ # Input sources for this stage.

813

{ # Description of an input or output of an execution stage.

814

"userName": "A String", # Human-readable name for this source; may be user or system generated.

815

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

816

# source is most closely associated.

817

"sizeBytes": "A String", # Size of the source, if measurable.

818

"name": "A String", # Dataflow service generated name for this source.

819

},

820

],

821

"name": "A String", # Dataflow service generated name for this stage.

822

"componentTransform": [ # Transforms that comprise this execution stage.

823

{ # Description of a transform executed as part of an execution stage.

824

"name": "A String", # Dataflow service generated name for this source.

825

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

826

"originalTransform": "A String", # User name for the original user transform with which this transform is

827

# most closely associated.

828

},

829

],

830

"id": "A String", # Dataflow service generated id for this stage.

831

"outputSource": [ # Output sources for this stage.

832

{ # Description of an input or output of an execution stage.

833

"userName": "A String", # Human-readable name for this source; may be user or system generated.

834

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

835

# source is most closely associated.

836

"sizeBytes": "A String", # Size of the source, if measurable.

837

"name": "A String", # Dataflow service generated name for this source.

838

},

839

],

840

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

845

#

846

# The labels map can contain no more than 64 entries. Entries of the labels

847

# map are UTF8 strings that comply with the following restrictions:

848

#

849

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

850

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

851

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

856

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

857

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

858

"workerRegion": "A String", # The Compute Engine region

859

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

860

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

861

# with worker_zone. If neither worker_region nor worker_zone is specified,

862

# default to the control plane's region.

863

"userAgent": { # A description of the process that generated the request.

864

"a_key": "", # Properties of the object.

865

},

866

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

867

"version": { # A structure describing which components and their versions of the service

868

# are required in order to run the job.

869

"a_key": "", # Properties of the object.

870

},

871

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

872

# at rest, AKA a Customer Managed Encryption Key (CMEK).

873

#

874

# Format:

875

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

876

"experiments": [ # The list of experiments to enable.

877

"A String",

878

],

879

"workerZone": "A String", # The Compute Engine zone

880

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

881

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

882

# with worker_region. If neither worker_region nor worker_zone is specified,

883

# a zone in the control plane's region is chosen based on available capacity.

884

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

885

# specified in order for the job to have workers.

886

{ # Describes one particular pool of Cloud Dataflow workers to be

887

# instantiated by the Cloud Dataflow service in order to perform the

888

# computations required by a job. Note that a workflow job may use

889

# multiple pools, in order to match the various computational

890

# requirements of the various stages of the job.

891

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

892

# Compute Engine API.

893

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

894

# only be set in the Fn API path. For non-cross-language pipelines this

895

# should have only one entry. Cross-language pipelines will have two or more

896

# entries.

897

{ # Defines a SDK harness container for executing Dataflow pipelines.

898

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

899

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

900

# container instance with this image. If false (or unset) recommends using

901

# more than one core per SDK container instance with this image for

902

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

907

# will attempt to choose a reasonable default.

908

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

909

# are supported.

910

"metadata": { # Metadata to set on the Google Compute Engine VMs.

911

"a_key": "A String",

912

},

913

"diskSourceImage": "A String", # Fully qualified source image for disks.

914

"dataDisks": [ # Data disks that are used by a VM in this workflow.

915

{ # Describes the data disk used by a workflow job.

916

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

917

# attempt to choose a reasonable default.

918

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

919

# must be a disk type appropriate to the project and zone in which

920

# the workers will run. If unknown or unspecified, the service

921

# will attempt to choose a reasonable default.

922

#

923

# For example, the standard persistent disk type is a resource name

924

# typically ending in "pd-standard". If SSD persistent disks are

925

# available, the resource name typically ends with "pd-ssd". The

926

# actual valid values are defined the Google Compute Engine API,

927

# not by the Cloud Dataflow API; consult the Google Compute Engine

928

# documentation for more information about determining the set of

929

# available disk types for a particular project and zone.

930

#

931

# Google Compute Engine Disk types are local to a particular

932

# project in a particular zone, and so the resource name will

933

# typically look something like this:

934

#

935

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

936

"mountPoint": "A String", # Directory in a VM where disk is mounted.

937

},

938

],

939

"packages": [ # Packages to be installed on workers.

940

{ # The packages that must be installed in order for a worker to run the

941

# steps of the Cloud Dataflow job that will be assigned to its worker

942

# pool.

943

#

944

# This is the mechanism by which the Cloud Dataflow SDK causes code to

945

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

946

# might use this to install jars containing the user's code and all of the

947

# various dependencies (libraries, data files, etc.) required in order

948

# for that code to run.

949

"name": "A String", # The name of the package.

950

"location": "A String", # The resource to read the package from. The supported resource type is:

951

#

952

# Google Cloud Storage:

953

#

954

# storage.googleapis.com/{bucket}

955

# bucket.storage.googleapis.com/

956

},

957

],

958

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

959

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

960

# `TEARDOWN_NEVER`.

961

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

962

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

963

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

964

# down.

965

#

966

# If the workers are not torn down by the service, they will

967

# continue to run and use Google Compute Engine VM resources in the

968

# user's project until they are explicitly terminated by the user.

969

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

970

# policy except for small, manually supervised test jobs.

971

#

972

# If unknown or unspecified, the service will attempt to choose a reasonable

973

# default.

974

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

975

# the service will use the network "default".

976

"ipConfiguration": "A String", # Configuration for VM IPs.

977

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

978

# attempt to choose a reasonable default.

979

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

980

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

981

"algorithm": "A String", # The algorithm to use for autoscaling.

982

},

983

"poolArgs": { # Extra arguments for this worker pool.

984

"a_key": "", # Properties of the object. Contains field @type with type URL.

985

},

986

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

987

# the form "regions/REGION/subnetworks/SUBNETWORK".

988

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

989

# execute the job. If zero or unspecified, the service will

990

# attempt to choose a reasonable default.

991

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

992

# service will choose a number of threads (according to the number of cores

993

# on the selected machine type for batch, or 1 by convention for streaming).

994

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

995

# harness, residing in Google Container Registry.

996

#

997

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

998

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

999

# using the standard Dataflow task runner. Users should ignore

1000

# this field.

1001

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1002

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1003

# access the Cloud Dataflow API.

1004

"A String",

1005

],

1006

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1007

#

1008

# When workers access Google Cloud APIs, they logically do so via

1009

# relative URLs. If this field is specified, it supplies the base

1010

# URL to use for resolving these relative URLs. The normative

1011

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1012

# Locators".

1013

#

1014

# If not specified, the default value is "http://www.googleapis.com/"

1015

"workflowFileName": "A String", # The file to store the workflow in.

1016

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1017

# console.

1018

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1019

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1020

# taskrunner; e.g. "root".

1021

"vmId": "A String", # The ID string of the VM.

1022

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1023

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1024

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1025

# "shuffle/v1beta1".

1026

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1027

# storage.

1028

#

1029

# The supported resource type is:

1030

#

1031

# Google Cloud Storage:

1032

#

1033

# storage.googleapis.com/{bucket}/{object}

1034

# bucket.storage.googleapis.com/{object}

1035

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1036

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1037

# "dataflow/v1b3/projects".

1038

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1039

#

1040

# When workers access Google Cloud APIs, they logically do so via

1041

# relative URLs. If this field is specified, it supplies the base

1042

# URL to use for resolving these relative URLs. The normative

1043

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1044

# Locators".

1045

#

1046

# If not specified, the default value is "http://www.googleapis.com/"

1047

"workerId": "A String", # The ID of the worker running this pipeline.

1048

},

1049

"harnessCommand": "A String", # The command to launch the worker harness.

1050

"logDir": "A String", # The directory on the VM to store logs.

1051

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1052

"languageHint": "A String", # The suggested backend language.

1053

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1054

# taskrunner; e.g. "wheel".

1055

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1056

# will not be uploaded.

1057

#

1058

# The supported resource type is:

1059

#

1060

# Google Cloud Storage:

1061

# storage.googleapis.com/{bucket}/{object}

1062

# bucket.storage.googleapis.com/{object}

1063

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1064

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1065

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1066

# temporary storage.

1067

#

1068

# The supported resource type is:

1069

#

1070

# Google Cloud Storage:

1071

# storage.googleapis.com/{bucket}/{object}

1072

# bucket.storage.googleapis.com/{object}

1073

},

1074

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1075

# attempt to choose a reasonable default.

1076

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1077

# select a default set of packages which are useful to worker

1078

# harnesses written in a particular language.

1079

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1080

# service will attempt to choose a reasonable default.

1081

},

1082

],

1083

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1084

# storage. The system will append the suffix "/temp-{JOBNAME} to

1085

# this resource prefix, where {JOBNAME} is the value of the

1086

# job_name field. The resulting bucket and object prefix is used

1087

# as the prefix of the resources used to store temporary data

1088

# needed during the job execution. NOTE: This will override the

1089

# value in taskrunner_settings.

1090

# The supported resource type is:

1091

#

1092

# Google Cloud Storage:

1093

#

1094

# storage.googleapis.com/{bucket}/{object}

1095

# bucket.storage.googleapis.com/{object}

1096

"internalExperiments": { # Experimental settings.

1097

"a_key": "", # Properties of the object. Contains field @type with type URL.

1098

},

1099

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1100

# options are passed through the service and are used to recreate the

1101

# SDK pipeline options on the worker in a language agnostic and platform

1102

# independent way.

1103

"a_key": "", # Properties of the object.

1104

},

1105

"dataset": "A String", # The dataset for the current project where various workflow

1106

# related tables are stored.

1107

#

1108

# The supported resource type is:

1109

#

1110

# Google BigQuery:

1111

# bigquery.googleapis.com/{dataset}

1112

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1113

# unspecified, the service will attempt to choose a reasonable

1114

# default. This should be in the form of the API service name,

1115

# e.g. "compute.googleapis.com".

1116

},

1117

"stepsLocation": "A String", # The GCS location where the steps are stored.

1118

"steps": [ # Exactly one of step or steps_location should be specified.

1119

#

1120

# The top-level steps that constitute the entire job.

1121

{ # Defines a particular step within a Cloud Dataflow job.

1122

#

1123

# A job consists of multiple steps, each of which performs some

1124

# specific operation as part of the overall job. Data is typically

1125

# passed from one step to another as part of the job.

1126

#

1127

# Here's an example of a sequence of steps which together implement a

1128

# Map-Reduce job:

1129

#

1130

# * Read a collection of data from some source, parsing the

1131

# collection's elements.

1132

#

1133

# * Validate the elements.

1134

#

1135

# * Apply a user-defined function to map each element to some value

1136

# and extract an element-specific key value.

1137

#

1138

# * Group elements with the same key into a single element with

1139

# that key, transforming a multiply-keyed collection into a

1140

# uniquely-keyed collection.

1141

#

1142

# * Write the elements out to some data sink.

1143

#

1144

# Note that the Cloud Dataflow service may be used to run many different

1145

# types of jobs, not just Map-Reduce.

1146

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1147

"properties": { # Named properties associated with the step. Each kind of

1148

# predefined step has its own required set of properties.

1149

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1150

"a_key": "", # Properties of the object.

1151

},

1152

"name": "A String", # The name that identifies the step. This must be unique for each

1153

# step with respect to all other steps in the Cloud Dataflow job.

1154

},

1155

],

1156

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1157

# callers cannot mutate it.

1158

{ # A message describing the state of a particular execution stage.

1159

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1160

"executionStageName": "A String", # The name of the execution stage.

1161

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1162

},

1163

],

1164

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1165

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1166

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1167

# by the metadata values provided here. Populated for ListJobs and all GetJob

1168

# views SUMMARY and higher.

1169

# ListJob response and Job SUMMARY view.

1170

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1171

"sdkSupportStatus": "A String", # The support status for this SDK version.

1172

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1173

"version": "A String", # The version of the SDK used to run the job.

1174

},

1175

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1176

{ # Metadata for a BigTable connector used by the job.

1177

"instanceId": "A String", # InstanceId accessed in the connection.

1178

"tableId": "A String", # TableId accessed in the connection.

1179

"projectId": "A String", # ProjectId accessed in the connection.

1180

},

1181

],

1182

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1183

{ # Metadata for a PubSub connector used by the job.

1184

"subscription": "A String", # Subscription used in the connection.

1185

"topic": "A String", # Topic accessed in the connection.

1186

},

1187

],

1188

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1189

{ # Metadata for a BigQuery connector used by the job.

1190

"dataset": "A String", # Dataset accessed in the connection.

1191

"projectId": "A String", # Project accessed in the connection.

1192

"query": "A String", # Query used to access data in the connection.

1193

"table": "A String", # Table accessed in the connection.

1194

},

1195

],

1196

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1197

{ # Metadata for a File connector used by the job.

1198

"filePattern": "A String", # File Pattern used to access files by the connector.

1199

},

1200

],

1201

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1202

{ # Metadata for a Datastore connector used by the job.

1203

"namespace": "A String", # Namespace used in the connection.

1204

"projectId": "A String", # ProjectId accessed in the connection.

1205

},

1206

],

1207

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1208

{ # Metadata for a Spanner connector used by the job.

1209

"instanceId": "A String", # InstanceId accessed in the connection.

1210

"databaseId": "A String", # DatabaseId accessed in the connection.

1211

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

1216

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1217

# contains this job.

1218

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1219

# corresponding name prefixes of the new job.

1220

"a_key": "A String",

1221

},

1222

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1223

# Flexible resource scheduling jobs are started with some delay after job

1224

# creation, so start_time is unset before start and is updated when the

1225

# job is started by the Cloud Dataflow service. For other jobs, start_time

1226

# always equals to create_time and is immutable and set by the Cloud Dataflow

1227

# service.

1228

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1229

# If this field is set, the service will ensure its uniqueness.

1230

# The request to create a job will fail if the service has knowledge of a

1231

# previously submitted job with the same client's ID and job name.

1232

# The caller may use this field to ensure idempotence of job

1233

# creation across retried attempts to create a job.

1234

# By default, the field is empty and, in that case, the service ignores it.

1235

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1236

# isn't contained in the submitted job.

1237

"stages": { # A mapping from each stage to the information about that stage.

1238

"a_key": { # Contains information about how a particular

1239

# google.dataflow.v1beta3.Step will be executed.

1240

"stepName": [ # The steps associated with the execution stage.

1241

# Note that stages may have several steps, and that a given step

1242

# might be run by more than one stage.

"A String",

],

},

},

},

"type": "A String", # The type of Cloud Dataflow job.

1249

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1250

# Cloud Dataflow service.

1251

"tempFiles": [ # A set of files the system should be aware of that are used

1252

# for temporary storage. These temporary files will be

1253

# removed on job completion.

1254

# No duplicates are allowed.

1255

# No file patterns are supported.

1256

#

1257

# The supported files are:

1258

#

1259

# Google Cloud Storage:

1260

#

1261

# storage.googleapis.com/{bucket}/{object}

1262

# bucket.storage.googleapis.com/{object}

1263

"A String",

1264

],

1265

"id": "A String", # The unique ID of this job.

1266

#

1267

# This field is set by the Cloud Dataflow service when the Job is

1268

# created, and is immutable for the life of the job.

1269

"requestedState": "A String", # The job's requested state.

1270

#

1271

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1272

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1273

# also be used to directly set a job's requested state to

1274

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1275

# job if it has not already reached a terminal state.

1276

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1277

# of the job it replaced.

1278

#

1279

# When sending a `CreateJobRequest`, you can update a job by specifying it

1280

# here. The job named here is stopped, and its intermediate state is

1281

# transferred to this job.

1282

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1283

# snapshot.

1284

"currentState": "A String", # The current state of the job.

1285

#

1286

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1287

# specified.

1288

#

1289

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1290

# terminal state. After a job has reached a terminal state, no

1291

# further state updates may be made.

1292

#

1293

# This field may be mutated by the Cloud Dataflow service;

1294

# callers cannot mutate it.

1295

"name": "A String", # The user-specified Cloud Dataflow job name.

1296

#

1297

# Only one Job with a given name may exist in a project at any

1298

# given time. If a caller attempts to create a Job with the same

1299

# name as an already-existing Job, the attempt returns the

1300

# existing Job.

1301

#

1302

# The name must match the regular expression

1303

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1304

"currentStateTime": "A String", # The timestamp associated with the current state.

1305

}</pre>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1309

<code class="details" id="get">get(projectId, location, jobId, view=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1310

<pre>Gets the state of the specified Cloud Dataflow job.

1311

1312

To get the state of a job, we recommend using `projects.locations.jobs.get`

1313

with a [regional endpoint]

1314

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1315

`projects.jobs.get` is not recommended, as you can only get the state of

1316

jobs that are running in `us-central1`.

1317

1318

Args:

1319

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1320

location: string, The [regional endpoint]

1321

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1322

contains this job. (required)

1323

jobId: string, The job ID. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1324

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1325

x__xgafv: string, V1 error format.

1326

Allowed values

1327

1 - v1 error format

1328

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1329

1330

Returns:

1331

An object of the form:

1332

1333

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1334

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1335

# A description of the user pipeline and stages through which it is executed.

1336

# Created by Cloud Dataflow service. Only retrieved with

1337

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1338

# form. This data is provided by the Dataflow service for ease of visualizing

1339

# the pipeline and interpreting Dataflow provided metrics.

1340

"displayData": [ # Pipeline level display data.

1341

{ # Data provided with a pipeline or transform to provide descriptive info.

1342

"url": "A String", # An optional full URL.

1343

"javaClassValue": "A String", # Contains value if the data is of java class type.

1344

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1345

"durationValue": "A String", # Contains value if the data is of duration type.

1346

"label": "A String", # An optional label to display in a dax UI for the element.

1347

"key": "A String", # The key identifying the display data.

1348

# This is intended to be used as a label for the display data

1349

# when viewed in a dax monitoring system.

1350

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1351

# language namespace (i.e. python module) which defines the display data.

1352

# This allows a dax monitoring system to specially handle the data

1353

# and perform custom rendering.

1354

"floatValue": 3.14, # Contains value if the data is of float type.

1355

"strValue": "A String", # Contains value if the data is of string type.

1356

"int64Value": "A String", # Contains value if the data is of int64 type.

1357

"boolValue": True or False, # Contains value if the data is of a boolean type.

1358

"shortStrValue": "A String", # A possible additional shorter value to display.

1359

# For example a java_class_name_value of com.mypackage.MyDoFn

1360

# will be stored with MyDoFn as the short_str_value and

1361

# com.mypackage.MyDoFn as the java_class_name value.

1362

# short_str_value can be displayed and java_class_name_value

1363

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1364

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1365

],

1366

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1367

{ # Description of the type, names/ids, and input/outputs for a transform.

1368

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1369

"A String",

1370

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1371

"displayData": [ # Transform-specific display data.

1372

{ # Data provided with a pipeline or transform to provide descriptive info.

1373

"url": "A String", # An optional full URL.

1374

"javaClassValue": "A String", # Contains value if the data is of java class type.

1375

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1376

"durationValue": "A String", # Contains value if the data is of duration type.

1377

"label": "A String", # An optional label to display in a dax UI for the element.

1378

"key": "A String", # The key identifying the display data.

1379

# This is intended to be used as a label for the display data

1380

# when viewed in a dax monitoring system.

1381

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1382

# language namespace (i.e. python module) which defines the display data.

1383

# This allows a dax monitoring system to specially handle the data

1384

# and perform custom rendering.

1385

"floatValue": 3.14, # Contains value if the data is of float type.

1386

"strValue": "A String", # Contains value if the data is of string type.

1387

"int64Value": "A String", # Contains value if the data is of int64 type.

1388

"boolValue": True or False, # Contains value if the data is of a boolean type.

1389

"shortStrValue": "A String", # A possible additional shorter value to display.

1390

# For example a java_class_name_value of com.mypackage.MyDoFn

1391

# will be stored with MyDoFn as the short_str_value and

1392

# com.mypackage.MyDoFn as the java_class_name value.

1393

# short_str_value can be displayed and java_class_name_value

1394

# will be displayed as a tooltip.

1395

},

1396

],

1397

"id": "A String", # SDK generated id of this transform instance.

1398

"inputCollectionName": [ # User names for all collection inputs to this transform.

1399

"A String",

1400

],

1401

"name": "A String", # User provided name for this transform instance.

1402

"kind": "A String", # Type of transform.

1403

},

1404

],

1405

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1406

{ # Description of the composing transforms, names/ids, and input/outputs of a

1407

# stage of execution. Some composing transforms and sources may have been

1408

# generated by the Dataflow service during execution planning.

1409

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1410

{ # Description of an interstitial value between transforms in an execution

1411

# stage.

1412

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1413

"name": "A String", # Dataflow service generated name for this source.

1414

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1415

# source is most closely associated.

1416

},

1417

],

1418

"inputSource": [ # Input sources for this stage.

1419

{ # Description of an input or output of an execution stage.

1420

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1421

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1422

# source is most closely associated.

1423

"sizeBytes": "A String", # Size of the source, if measurable.

1424

"name": "A String", # Dataflow service generated name for this source.

1425

},

1426

],

1427

"name": "A String", # Dataflow service generated name for this stage.

1428

"componentTransform": [ # Transforms that comprise this execution stage.

1429

{ # Description of a transform executed as part of an execution stage.

1430

"name": "A String", # Dataflow service generated name for this source.

1431

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1432

"originalTransform": "A String", # User name for the original user transform with which this transform is

1433

# most closely associated.

1434

},

1435

],

1436

"id": "A String", # Dataflow service generated id for this stage.

1437

"outputSource": [ # Output sources for this stage.

1438

{ # Description of an input or output of an execution stage.

1439

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1440

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1441

# source is most closely associated.

1442

"sizeBytes": "A String", # Size of the source, if measurable.

1443

"name": "A String", # Dataflow service generated name for this source.

1444

},

1445

],

1446

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

1451

#

1452

# The labels map can contain no more than 64 entries. Entries of the labels

1453

# map are UTF8 strings that comply with the following restrictions:

1454

#

1455

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1456

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1457

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1462

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

1463

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1464

"workerRegion": "A String", # The Compute Engine region

1465

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1466

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1467

# with worker_zone. If neither worker_region nor worker_zone is specified,

1468

# default to the control plane's region.

1469

"userAgent": { # A description of the process that generated the request.

1470

"a_key": "", # Properties of the object.

1471

},

1472

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

1473

"version": { # A structure describing which components and their versions of the service

1474

# are required in order to run the job.

1475

"a_key": "", # Properties of the object.

1476

},

1477

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1478

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1479

#

1480

# Format:

1481

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1482

"experiments": [ # The list of experiments to enable.

1483

"A String",

1484

],

1485

"workerZone": "A String", # The Compute Engine zone

1486

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1487

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1488

# with worker_region. If neither worker_region nor worker_zone is specified,

1489

# a zone in the control plane's region is chosen based on available capacity.

1490

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1491

# specified in order for the job to have workers.

1492

{ # Describes one particular pool of Cloud Dataflow workers to be

1493

# instantiated by the Cloud Dataflow service in order to perform the

1494

# computations required by a job. Note that a workflow job may use

1495

# multiple pools, in order to match the various computational

1496

# requirements of the various stages of the job.

1497

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1498

# Compute Engine API.

1499

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1500

# only be set in the Fn API path. For non-cross-language pipelines this

1501

# should have only one entry. Cross-language pipelines will have two or more

1502

# entries.

1503

{ # Defines a SDK harness container for executing Dataflow pipelines.

1504

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1505

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1506

# container instance with this image. If false (or unset) recommends using

1507

# more than one core per SDK container instance with this image for

1508

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1513

# will attempt to choose a reasonable default.

1514

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1515

# are supported.

1516

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1517

"a_key": "A String",

1518

},

1519

"diskSourceImage": "A String", # Fully qualified source image for disks.

1520

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1521

{ # Describes the data disk used by a workflow job.

1522

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1523

# attempt to choose a reasonable default.

1524

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1525

# must be a disk type appropriate to the project and zone in which

1526

# the workers will run. If unknown or unspecified, the service

1527

# will attempt to choose a reasonable default.

1528

#

1529

# For example, the standard persistent disk type is a resource name

1530

# typically ending in "pd-standard". If SSD persistent disks are

1531

# available, the resource name typically ends with "pd-ssd". The

1532

# actual valid values are defined the Google Compute Engine API,

1533

# not by the Cloud Dataflow API; consult the Google Compute Engine

1534

# documentation for more information about determining the set of

1535

# available disk types for a particular project and zone.

1536

#

1537

# Google Compute Engine Disk types are local to a particular

1538

# project in a particular zone, and so the resource name will

1539

# typically look something like this:

1540

#

1541

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1542

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1543

},

1544

],

1545

"packages": [ # Packages to be installed on workers.

1546

{ # The packages that must be installed in order for a worker to run the

1547

# steps of the Cloud Dataflow job that will be assigned to its worker

1548

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1549

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1550

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1551

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1552

# might use this to install jars containing the user's code and all of the

1553

# various dependencies (libraries, data files, etc.) required in order

1554

# for that code to run.

1555

"name": "A String", # The name of the package.

1556

"location": "A String", # The resource to read the package from. The supported resource type is:

1557

#

1558

# Google Cloud Storage:

1559

#

1560

# storage.googleapis.com/{bucket}

1561

# bucket.storage.googleapis.com/

1562

},

1563

],

1564

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1565

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1566

# `TEARDOWN_NEVER`.

1567

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1568

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1569

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1570

# down.

1571

#

1572

# If the workers are not torn down by the service, they will

1573

# continue to run and use Google Compute Engine VM resources in the

1574

# user's project until they are explicitly terminated by the user.

1575

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1576

# policy except for small, manually supervised test jobs.

1577

#

1578

# If unknown or unspecified, the service will attempt to choose a reasonable

1579

# default.

1580

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1581

# the service will use the network "default".

1582

"ipConfiguration": "A String", # Configuration for VM IPs.

1583

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1584

# attempt to choose a reasonable default.

1585

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1586

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1587

"algorithm": "A String", # The algorithm to use for autoscaling.

1588

},

1589

"poolArgs": { # Extra arguments for this worker pool.

1590

"a_key": "", # Properties of the object. Contains field @type with type URL.

1591

},

1592

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1593

# the form "regions/REGION/subnetworks/SUBNETWORK".

1594

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1595

# execute the job. If zero or unspecified, the service will

1596

# attempt to choose a reasonable default.

1597

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1598

# service will choose a number of threads (according to the number of cores

1599

# on the selected machine type for batch, or 1 by convention for streaming).

1600

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1601

# harness, residing in Google Container Registry.

1602

#

1603

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1604

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1605

# using the standard Dataflow task runner. Users should ignore

1606

# this field.

1607

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1608

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1609

# access the Cloud Dataflow API.

1610

"A String",

1611

],

1612

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1613

#

1614

# When workers access Google Cloud APIs, they logically do so via

1615

# relative URLs. If this field is specified, it supplies the base

1616

# URL to use for resolving these relative URLs. The normative

1617

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1618

# Locators".

1619

#

1620

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1621

"workflowFileName": "A String", # The file to store the workflow in.

1622

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1623

# console.

1624

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1625

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1626

# taskrunner; e.g. "root".

1627

"vmId": "A String", # The ID string of the VM.

1628

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1629

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1630

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1631

# "shuffle/v1beta1".

1632

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1633

# storage.

1634

#

1635

# The supported resource type is:

1636

#

1637

# Google Cloud Storage:

1638

#

1639

# storage.googleapis.com/{bucket}/{object}

1640

# bucket.storage.googleapis.com/{object}

1641

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1642

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1643

# "dataflow/v1b3/projects".

1644

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1645

#

1646

# When workers access Google Cloud APIs, they logically do so via

1647

# relative URLs. If this field is specified, it supplies the base

1648

# URL to use for resolving these relative URLs. The normative

1649

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1650

# Locators".

1651

#

1652

# If not specified, the default value is "http://www.googleapis.com/"

1653

"workerId": "A String", # The ID of the worker running this pipeline.

1654

},

1655

"harnessCommand": "A String", # The command to launch the worker harness.

1656

"logDir": "A String", # The directory on the VM to store logs.

1657

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1658

"languageHint": "A String", # The suggested backend language.

1659

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1660

# taskrunner; e.g. "wheel".

1661

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1662

# will not be uploaded.

1663

#

1664

# The supported resource type is:

1665

#

1666

# Google Cloud Storage:

1667

# storage.googleapis.com/{bucket}/{object}

1668

# bucket.storage.googleapis.com/{object}

1669

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1670

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1671

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1672

# temporary storage.

1673

#

1674

# The supported resource type is:

1675

#

1676

# Google Cloud Storage:

1677

# storage.googleapis.com/{bucket}/{object}

1678

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1679

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1680

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1681

# attempt to choose a reasonable default.

1682

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1683

# select a default set of packages which are useful to worker

1684

# harnesses written in a particular language.

1685

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1686

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1687

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1688

],

1689

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1690

# storage. The system will append the suffix "/temp-{JOBNAME} to

1691

# this resource prefix, where {JOBNAME} is the value of the

1692

# job_name field. The resulting bucket and object prefix is used

1693

# as the prefix of the resources used to store temporary data

1694

# needed during the job execution. NOTE: This will override the

1695

# value in taskrunner_settings.

1696

# The supported resource type is:

1697

#

1698

# Google Cloud Storage:

1699

#

1700

# storage.googleapis.com/{bucket}/{object}

1701

# bucket.storage.googleapis.com/{object}

1702

"internalExperiments": { # Experimental settings.

1703

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1704

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1705

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1706

# options are passed through the service and are used to recreate the

1707

# SDK pipeline options on the worker in a language agnostic and platform

1708

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1709

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1710

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1711

"dataset": "A String", # The dataset for the current project where various workflow

1712

# related tables are stored.

1713

#

1714

# The supported resource type is:

1715

#

1716

# Google BigQuery:

1717

# bigquery.googleapis.com/{dataset}

1718

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

1719

# unspecified, the service will attempt to choose a reasonable

1720

# default. This should be in the form of the API service name,

1721

# e.g. "compute.googleapis.com".

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1722

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1723

"stepsLocation": "A String", # The GCS location where the steps are stored.

1724

"steps": [ # Exactly one of step or steps_location should be specified.

1725

#

1726

# The top-level steps that constitute the entire job.

1727

{ # Defines a particular step within a Cloud Dataflow job.

1728

#

1729

# A job consists of multiple steps, each of which performs some

1730

# specific operation as part of the overall job. Data is typically

1731

# passed from one step to another as part of the job.

1732

#

1733

# Here's an example of a sequence of steps which together implement a

1734

# Map-Reduce job:

1735

#

1736

# * Read a collection of data from some source, parsing the

1737

# collection's elements.

1738

#

1739

# * Validate the elements.

1740

#

1741

# * Apply a user-defined function to map each element to some value

1742

# and extract an element-specific key value.

1743

#

1744

# * Group elements with the same key into a single element with

1745

# that key, transforming a multiply-keyed collection into a

1746

# uniquely-keyed collection.

1747

#

1748

# * Write the elements out to some data sink.

1749

#

1750

# Note that the Cloud Dataflow service may be used to run many different

1751

# types of jobs, not just Map-Reduce.

1752

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1753

"properties": { # Named properties associated with the step. Each kind of

1754

# predefined step has its own required set of properties.

1755

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

1756

"a_key": "", # Properties of the object.

1757

},

1758

"name": "A String", # The name that identifies the step. This must be unique for each

1759

# step with respect to all other steps in the Cloud Dataflow job.

1760

},

1761

],

1762

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1763

# callers cannot mutate it.

1764

{ # A message describing the state of a particular execution stage.

1765

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1766

"executionStageName": "A String", # The name of the execution stage.

1767

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1768

},

1769

],

1770

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1771

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1772

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1773

# by the metadata values provided here. Populated for ListJobs and all GetJob

1774

# views SUMMARY and higher.

1775

# ListJob response and Job SUMMARY view.

1776

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1777

"sdkSupportStatus": "A String", # The support status for this SDK version.

1778

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1779

"version": "A String", # The version of the SDK used to run the job.

1780

},

1781

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1782

{ # Metadata for a BigTable connector used by the job.

1783

"instanceId": "A String", # InstanceId accessed in the connection.

1784

"tableId": "A String", # TableId accessed in the connection.

1785

"projectId": "A String", # ProjectId accessed in the connection.

1786

},

1787

],

1788

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1789

{ # Metadata for a PubSub connector used by the job.

1790

"subscription": "A String", # Subscription used in the connection.

1791

"topic": "A String", # Topic accessed in the connection.

1792

},

1793

],

1794

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1795

{ # Metadata for a BigQuery connector used by the job.

1796

"dataset": "A String", # Dataset accessed in the connection.

1797

"projectId": "A String", # Project accessed in the connection.

1798

"query": "A String", # Query used to access data in the connection.

1799

"table": "A String", # Table accessed in the connection.

1800

},

1801

],

1802

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1803

{ # Metadata for a File connector used by the job.

1804

"filePattern": "A String", # File Pattern used to access files by the connector.

1805

},

1806

],

1807

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1808

{ # Metadata for a Datastore connector used by the job.

1809

"namespace": "A String", # Namespace used in the connection.

1810

"projectId": "A String", # ProjectId accessed in the connection.

1811

},

1812

],

1813

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1814

{ # Metadata for a Spanner connector used by the job.

1815

"instanceId": "A String", # InstanceId accessed in the connection.

1816

"databaseId": "A String", # DatabaseId accessed in the connection.

1817

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

1822

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1823

# contains this job.

1824

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

1825

# corresponding name prefixes of the new job.

1826

"a_key": "A String",

1827

},

1828

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1829

# Flexible resource scheduling jobs are started with some delay after job

1830

# creation, so start_time is unset before start and is updated when the

1831

# job is started by the Cloud Dataflow service. For other jobs, start_time

1832

# always equals to create_time and is immutable and set by the Cloud Dataflow

1833

# service.

1834

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1835

# If this field is set, the service will ensure its uniqueness.

1836

# The request to create a job will fail if the service has knowledge of a

1837

# previously submitted job with the same client's ID and job name.

1838

# The caller may use this field to ensure idempotence of job

1839

# creation across retried attempts to create a job.

1840

# By default, the field is empty and, in that case, the service ignores it.

1841

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1842

# isn't contained in the submitted job.

1843

"stages": { # A mapping from each stage to the information about that stage.

1844

"a_key": { # Contains information about how a particular

1845

# google.dataflow.v1beta3.Step will be executed.

1846

"stepName": [ # The steps associated with the execution stage.

1847

# Note that stages may have several steps, and that a given step

1848

# might be run by more than one stage.

1849

"A String",

1850

],

1851

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1852

},

1853

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1854

"type": "A String", # The type of Cloud Dataflow job.

1855

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1856

# Cloud Dataflow service.

1857

"tempFiles": [ # A set of files the system should be aware of that are used

1858

# for temporary storage. These temporary files will be

1859

# removed on job completion.

1860

# No duplicates are allowed.

1861

# No file patterns are supported.

1862

#

1863

# The supported files are:

1864

#

1865

# Google Cloud Storage:

1866

#

1867

# storage.googleapis.com/{bucket}/{object}

1868

# bucket.storage.googleapis.com/{object}

1869

"A String",

1870

],

1871

"id": "A String", # The unique ID of this job.

1872

#

1873

# This field is set by the Cloud Dataflow service when the Job is

1874

# created, and is immutable for the life of the job.

1875

"requestedState": "A String", # The job's requested state.

1876

#

1877

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1878

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1879

# also be used to directly set a job's requested state to

1880

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1881

# job if it has not already reached a terminal state.

1882

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1883

# of the job it replaced.

1884

#

1885

# When sending a `CreateJobRequest`, you can update a job by specifying it

1886

# here. The job named here is stopped, and its intermediate state is

1887

# transferred to this job.

1888

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1889

# snapshot.

1890

"currentState": "A String", # The current state of the job.

1891

#

1892

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1893

# specified.

1894

#

1895

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1896

# terminal state. After a job has reached a terminal state, no

1897

# further state updates may be made.

1898

#

1899

# This field may be mutated by the Cloud Dataflow service;

1900

# callers cannot mutate it.

1901

"name": "A String", # The user-specified Cloud Dataflow job name.

1902

#

1903

# Only one Job with a given name may exist in a project at any

1904

# given time. If a caller attempts to create a Job with the same

1905

# name as an already-existing Job, the attempt returns the

1906

# existing Job.

1907

#

1908

# The name must match the regular expression

1909

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

1910

"currentStateTime": "A String", # The timestamp associated with the current state.

1911

}</pre>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

</div>

<code class="details" id="getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</code>

1916

<pre>Request the job status.

1917

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1918

To request the status of a job, we recommend using

1919

`projects.locations.jobs.getMetrics` with a [regional endpoint]

1920

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1921

`projects.jobs.getMetrics` is not recommended, as you can only request the

1922

status of jobs that are running in `us-central1`.

1923

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1924

Args:

1925

projectId: string, A project id. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1926

location: string, The [regional endpoint]

1927

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1928

contains the job specified by job_id. (required)

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1929

jobId: string, The job to get messages for. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1930

startTime: string, Return only metric data that has changed since this time.

1931

Default is to return all information about all metrics for the job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1932

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1933

Allowed values

1934

1 - v1 error format

1935

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1936

1937

Returns:

1938

An object of the form:

1939

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1940

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1941

# of a Dataflow job. Metrics correspond to user-defined and system-defined

1942

# metrics in the job.

1943

#

1944

# This resource captures only the most recent values of each metric;

1945

# time-series data can be queried for them (under the same metric names)

1946

# from Cloud Monitoring.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1947

"metricTime": "A String", # Timestamp as of which metric values are current.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1948

"metrics": [ # All metrics for this job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1949

{ # Describes the state of a metric.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1950

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1951

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

1952

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1953

# The specified aggregation kind is case-insensitive.

1954

#

1955

# If omitted, this is not an aggregated value but instead

1956

# a single metric sample value.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1957

"gauge": "", # A struct value describing properties of a Gauge.

1958

# Metrics of gauge type show the value of a metric across time, and is

1959

# aggregated based on the newest value.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1960

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1961

# reporting work progress; it will be filled in responses from the

1962

# metrics API.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1963

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

1964

# "And", and "Or". The possible value types are Long, Double, and Boolean.

1965

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

1966

# value accumulated since the worker started working on this WorkItem.

1967

# By default this is false, indicating that this metric is reported

1968

# as a delta that is not associated with any WorkItem.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1969

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

1970

# metric.

1971

"context": { # Zero or more labeled fields which identify the part of the job this

1972

# metric is associated with, such as the name of a step or collection.

1973

#

1974

# For example, built-in counters associated with steps will have

1975

# context['step'] = <step-name>. Counters associated with PCollections

1976

# in the SDK will have context['pcollection'] = <pcollection-name>.

1977

"a_key": "A String",

1978

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1979

"name": "A String", # Worker-defined metric name.

1980

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

1981

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1982

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

1983

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

1984

# This holds the count of the aggregated values and is used in combination

1985

# with mean_sum above to obtain the actual mean aggregate value.

1986

# The only possible value type is Long.

1987

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

1988

# This holds the sum of the aggregated values and is used in combination

1989

# with mean_count below to obtain the actual mean aggregate value.

1990

# The only possible value types are Long and Double.

1991

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

1992

# possible value type is a list of Values whose type can be Long, Double,

1993

# or String, according to the metric's type. All Values in the list must

1994

# be of the same type.

1995

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

1996

# service.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1997

},

1998

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2003

<code class="details" id="list">list(projectId, location, pageToken=None, view=None, pageSize=None, filter=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2004

<pre>List the jobs of a project.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2005

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2006

To list the jobs of a project in a region, we recommend using

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2007

`projects.locations.jobs.list` with a [regional endpoint]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2008

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2009

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2010

`projects.jobs.list` is not recommended, as you can only get the list of

2011

jobs that are running in `us-central1`.

2012

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2013

Args:

2014

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2015

location: string, The [regional endpoint]

2016

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2017

contains this job. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2018

pageToken: string, Set this to the 'next_page_token' field of a previous response

2019

to request additional results in a long list.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2020

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2021

pageSize: integer, If there are many jobs, limit response to at most this many.

2022

The actual number of jobs returned will be the lesser of max_responses

2023

and an unspecified server-defined limit.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2024

filter: string, The kind of filter to use.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2025

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2026

Allowed values

2027

1 - v1 error format

2028

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2029

2030

Returns:

2031

An object of the form:

2032

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2033

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2034

# be a partial response, depending on the page size in the ListJobsRequest.

2035

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2036

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2037

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2038

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2039

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2040

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2041

# A description of the user pipeline and stages through which it is executed.

2042

# Created by Cloud Dataflow service. Only retrieved with

2043

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2044

# form. This data is provided by the Dataflow service for ease of visualizing

2045

# the pipeline and interpreting Dataflow provided metrics.

2046

"displayData": [ # Pipeline level display data.

2047

{ # Data provided with a pipeline or transform to provide descriptive info.

2048

"url": "A String", # An optional full URL.

2049

"javaClassValue": "A String", # Contains value if the data is of java class type.

2050

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2051

"durationValue": "A String", # Contains value if the data is of duration type.

2052

"label": "A String", # An optional label to display in a dax UI for the element.

2053

"key": "A String", # The key identifying the display data.

2054

# This is intended to be used as a label for the display data

2055

# when viewed in a dax monitoring system.

2056

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2057

# language namespace (i.e. python module) which defines the display data.

2058

# This allows a dax monitoring system to specially handle the data

2059

# and perform custom rendering.

2060

"floatValue": 3.14, # Contains value if the data is of float type.

2061

"strValue": "A String", # Contains value if the data is of string type.

2062

"int64Value": "A String", # Contains value if the data is of int64 type.

2063

"boolValue": True or False, # Contains value if the data is of a boolean type.

2064

"shortStrValue": "A String", # A possible additional shorter value to display.

2065

# For example a java_class_name_value of com.mypackage.MyDoFn

2066

# will be stored with MyDoFn as the short_str_value and

2067

# com.mypackage.MyDoFn as the java_class_name value.

2068

# short_str_value can be displayed and java_class_name_value

2069

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2070

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2071

],

2072

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2073

{ # Description of the type, names/ids, and input/outputs for a transform.

2074

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2075

"A String",

2076

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2077

"displayData": [ # Transform-specific display data.

2078

{ # Data provided with a pipeline or transform to provide descriptive info.

2079

"url": "A String", # An optional full URL.

2080

"javaClassValue": "A String", # Contains value if the data is of java class type.

2081

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2082

"durationValue": "A String", # Contains value if the data is of duration type.

2083

"label": "A String", # An optional label to display in a dax UI for the element.

2084

"key": "A String", # The key identifying the display data.

2085

# This is intended to be used as a label for the display data

2086

# when viewed in a dax monitoring system.

2087

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2088

# language namespace (i.e. python module) which defines the display data.

2089

# This allows a dax monitoring system to specially handle the data

2090

# and perform custom rendering.

2091

"floatValue": 3.14, # Contains value if the data is of float type.

2092

"strValue": "A String", # Contains value if the data is of string type.

2093

"int64Value": "A String", # Contains value if the data is of int64 type.

2094

"boolValue": True or False, # Contains value if the data is of a boolean type.

2095

"shortStrValue": "A String", # A possible additional shorter value to display.

2096

# For example a java_class_name_value of com.mypackage.MyDoFn

2097

# will be stored with MyDoFn as the short_str_value and

2098

# com.mypackage.MyDoFn as the java_class_name value.

2099

# short_str_value can be displayed and java_class_name_value

2100

# will be displayed as a tooltip.

2101

},

2102

],

2103

"id": "A String", # SDK generated id of this transform instance.

2104

"inputCollectionName": [ # User names for all collection inputs to this transform.

2105

"A String",

2106

],

2107

"name": "A String", # User provided name for this transform instance.

2108

"kind": "A String", # Type of transform.

2109

},

2110

],

2111

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2112

{ # Description of the composing transforms, names/ids, and input/outputs of a

2113

# stage of execution. Some composing transforms and sources may have been

2114

# generated by the Dataflow service during execution planning.

2115

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2116

{ # Description of an interstitial value between transforms in an execution

2117

# stage.

2118

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2119

"name": "A String", # Dataflow service generated name for this source.

2120

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2121

# source is most closely associated.

2122

},

2123

],

2124

"inputSource": [ # Input sources for this stage.

2125

{ # Description of an input or output of an execution stage.

2126

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2127

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2128

# source is most closely associated.

2129

"sizeBytes": "A String", # Size of the source, if measurable.

2130

"name": "A String", # Dataflow service generated name for this source.

2131

},

2132

],

2133

"name": "A String", # Dataflow service generated name for this stage.

2134

"componentTransform": [ # Transforms that comprise this execution stage.

2135

{ # Description of a transform executed as part of an execution stage.

2136

"name": "A String", # Dataflow service generated name for this source.

2137

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2138

"originalTransform": "A String", # User name for the original user transform with which this transform is

2139

# most closely associated.

2140

},

2141

],

2142

"id": "A String", # Dataflow service generated id for this stage.

2143

"outputSource": [ # Output sources for this stage.

2144

{ # Description of an input or output of an execution stage.

2145

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2146

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2147

# source is most closely associated.

2148

"sizeBytes": "A String", # Size of the source, if measurable.

2149

"name": "A String", # Dataflow service generated name for this source.

2150

},

2151

],

2152

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

2157

#

2158

# The labels map can contain no more than 64 entries. Entries of the labels

2159

# map are UTF8 strings that comply with the following restrictions:

2160

#

2161

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2162

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2163

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2168

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

2169

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2170

"workerRegion": "A String", # The Compute Engine region

2171

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2172

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2173

# with worker_zone. If neither worker_region nor worker_zone is specified,

2174

# default to the control plane's region.

2175

"userAgent": { # A description of the process that generated the request.

2176

"a_key": "", # Properties of the object.

2177

},

2178

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2179

"version": { # A structure describing which components and their versions of the service

2180

# are required in order to run the job.

2181

"a_key": "", # Properties of the object.

2182

},

2183

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2184

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2185

#

2186

# Format:

2187

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2188

"experiments": [ # The list of experiments to enable.

2189

"A String",

2190

],

2191

"workerZone": "A String", # The Compute Engine zone

2192

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2193

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2194

# with worker_region. If neither worker_region nor worker_zone is specified,

2195

# a zone in the control plane's region is chosen based on available capacity.

2196

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2197

# specified in order for the job to have workers.

2198

{ # Describes one particular pool of Cloud Dataflow workers to be

2199

# instantiated by the Cloud Dataflow service in order to perform the

2200

# computations required by a job. Note that a workflow job may use

2201

# multiple pools, in order to match the various computational

2202

# requirements of the various stages of the job.

2203

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2204

# Compute Engine API.

2205

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2206

# only be set in the Fn API path. For non-cross-language pipelines this

2207

# should have only one entry. Cross-language pipelines will have two or more

2208

# entries.

2209

{ # Defines a SDK harness container for executing Dataflow pipelines.

2210

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2211

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2212

# container instance with this image. If false (or unset) recommends using

2213

# more than one core per SDK container instance with this image for

2214

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2219

# will attempt to choose a reasonable default.

2220

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2221

# are supported.

2222

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2223

"a_key": "A String",

2224

},

2225

"diskSourceImage": "A String", # Fully qualified source image for disks.

2226

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2227

{ # Describes the data disk used by a workflow job.

2228

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2229

# attempt to choose a reasonable default.

2230

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2231

# must be a disk type appropriate to the project and zone in which

2232

# the workers will run. If unknown or unspecified, the service

2233

# will attempt to choose a reasonable default.

2234

#

2235

# For example, the standard persistent disk type is a resource name

2236

# typically ending in "pd-standard". If SSD persistent disks are

2237

# available, the resource name typically ends with "pd-ssd". The

2238

# actual valid values are defined the Google Compute Engine API,

2239

# not by the Cloud Dataflow API; consult the Google Compute Engine

2240

# documentation for more information about determining the set of

2241

# available disk types for a particular project and zone.

2242

#

2243

# Google Compute Engine Disk types are local to a particular

2244

# project in a particular zone, and so the resource name will

2245

# typically look something like this:

2246

#

2247

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2248

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2249

},

2250

],

2251

"packages": [ # Packages to be installed on workers.

2252

{ # The packages that must be installed in order for a worker to run the

2253

# steps of the Cloud Dataflow job that will be assigned to its worker

2254

# pool.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2255

#

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2256

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2257

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2258

# might use this to install jars containing the user's code and all of the

2259

# various dependencies (libraries, data files, etc.) required in order

2260

# for that code to run.

2261

"name": "A String", # The name of the package.

2262

"location": "A String", # The resource to read the package from. The supported resource type is:

2263

#

2264

# Google Cloud Storage:

2265

#

2266

# storage.googleapis.com/{bucket}

2267

# bucket.storage.googleapis.com/

2268

},

2269

],

2270

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2271

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2272

# `TEARDOWN_NEVER`.

2273

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2274

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2275

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2276

# down.

2277

#

2278

# If the workers are not torn down by the service, they will

2279

# continue to run and use Google Compute Engine VM resources in the

2280

# user's project until they are explicitly terminated by the user.

2281

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2282

# policy except for small, manually supervised test jobs.

2283

#

2284

# If unknown or unspecified, the service will attempt to choose a reasonable

2285

# default.

2286

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2287

# the service will use the network "default".

2288

"ipConfiguration": "A String", # Configuration for VM IPs.

2289

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2290

# attempt to choose a reasonable default.

2291

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2292

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2293

"algorithm": "A String", # The algorithm to use for autoscaling.

2294

},

2295

"poolArgs": { # Extra arguments for this worker pool.

2296

"a_key": "", # Properties of the object. Contains field @type with type URL.

2297

},

2298

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2299

# the form "regions/REGION/subnetworks/SUBNETWORK".

2300

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2301

# execute the job. If zero or unspecified, the service will

2302

# attempt to choose a reasonable default.

2303

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2304

# service will choose a number of threads (according to the number of cores

2305

# on the selected machine type for batch, or 1 by convention for streaming).

2306

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2307

# harness, residing in Google Container Registry.

2308

#

2309

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2310

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2311

# using the standard Dataflow task runner. Users should ignore

2312

# this field.

2313

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2314

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2315

# access the Cloud Dataflow API.

2316

"A String",

2317

],

2318

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2319

#

2320

# When workers access Google Cloud APIs, they logically do so via

2321

# relative URLs. If this field is specified, it supplies the base

2322

# URL to use for resolving these relative URLs. The normative

2323

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2324

# Locators".

2325

#

2326

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2327

"workflowFileName": "A String", # The file to store the workflow in.

2328

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2329

# console.

2330

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2331

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2332

# taskrunner; e.g. "root".

2333

"vmId": "A String", # The ID string of the VM.

2334

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2335

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2336

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2337

# "shuffle/v1beta1".

2338

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2339

# storage.

2340

#

2341

# The supported resource type is:

2342

#

2343

# Google Cloud Storage:

2344

#

2345

# storage.googleapis.com/{bucket}/{object}

2346

# bucket.storage.googleapis.com/{object}

2347

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2348

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2349

# "dataflow/v1b3/projects".

2350

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2351

#

2352

# When workers access Google Cloud APIs, they logically do so via

2353

# relative URLs. If this field is specified, it supplies the base

2354

# URL to use for resolving these relative URLs. The normative

2355

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2356

# Locators".

2357

#

2358

# If not specified, the default value is "http://www.googleapis.com/"

2359

"workerId": "A String", # The ID of the worker running this pipeline.

2360

},

2361

"harnessCommand": "A String", # The command to launch the worker harness.

2362

"logDir": "A String", # The directory on the VM to store logs.

2363

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2364

"languageHint": "A String", # The suggested backend language.

2365

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2366

# taskrunner; e.g. "wheel".

2367

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2368

# will not be uploaded.

2369

#

2370

# The supported resource type is:

2371

#

2372

# Google Cloud Storage:

2373

# storage.googleapis.com/{bucket}/{object}

2374

# bucket.storage.googleapis.com/{object}

2375

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2376

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2377

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2378

# temporary storage.

2379

#

2380

# The supported resource type is:

2381

#

2382

# Google Cloud Storage:

2383

# storage.googleapis.com/{bucket}/{object}

2384

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2385

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2386

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

2387

# attempt to choose a reasonable default.

2388

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2389

# select a default set of packages which are useful to worker

2390

# harnesses written in a particular language.

2391

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2392

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2393

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2394

],

2395

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2396

# storage. The system will append the suffix "/temp-{JOBNAME} to

2397

# this resource prefix, where {JOBNAME} is the value of the

2398

# job_name field. The resulting bucket and object prefix is used

2399

# as the prefix of the resources used to store temporary data

2400

# needed during the job execution. NOTE: This will override the

2401

# value in taskrunner_settings.

2402

# The supported resource type is:

2403

#

2404

# Google Cloud Storage:

2405

#

2406

# storage.googleapis.com/{bucket}/{object}

2407

# bucket.storage.googleapis.com/{object}

2408

"internalExperiments": { # Experimental settings.

2409

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2410

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2411

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2412

# options are passed through the service and are used to recreate the

2413

# SDK pipeline options on the worker in a language agnostic and platform

2414

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2415

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2416

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2417

"dataset": "A String", # The dataset for the current project where various workflow

2418

# related tables are stored.

2419

#

2420

# The supported resource type is:

2421

#

2422

# Google BigQuery:

2423

# bigquery.googleapis.com/{dataset}

2424

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

2425

# unspecified, the service will attempt to choose a reasonable

2426

# default. This should be in the form of the API service name,

2427

# e.g. "compute.googleapis.com".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2428

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2429

"stepsLocation": "A String", # The GCS location where the steps are stored.

2430

"steps": [ # Exactly one of step or steps_location should be specified.

2431

#

2432

# The top-level steps that constitute the entire job.

2433

{ # Defines a particular step within a Cloud Dataflow job.

2434

#

2435

# A job consists of multiple steps, each of which performs some

2436

# specific operation as part of the overall job. Data is typically

2437

# passed from one step to another as part of the job.

2438

#

2439

# Here's an example of a sequence of steps which together implement a

2440

# Map-Reduce job:

2441

#

2442

# * Read a collection of data from some source, parsing the

2443

# collection's elements.

2444

#

2445

# * Validate the elements.

2446

#

2447

# * Apply a user-defined function to map each element to some value

2448

# and extract an element-specific key value.

2449

#

2450

# * Group elements with the same key into a single element with

2451

# that key, transforming a multiply-keyed collection into a

2452

# uniquely-keyed collection.

2453

#

2454

# * Write the elements out to some data sink.

2455

#

2456

# Note that the Cloud Dataflow service may be used to run many different

2457

# types of jobs, not just Map-Reduce.

2458

"kind": "A String", # The kind of step in the Cloud Dataflow job.

2459

"properties": { # Named properties associated with the step. Each kind of

2460

# predefined step has its own required set of properties.

2461

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

2462

"a_key": "", # Properties of the object.

2463

},

2464

"name": "A String", # The name that identifies the step. This must be unique for each

2465

# step with respect to all other steps in the Cloud Dataflow job.

2466

},

2467

],

2468

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2469

# callers cannot mutate it.

2470

{ # A message describing the state of a particular execution stage.

2471

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2472

"executionStageName": "A String", # The name of the execution stage.

2473

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2474

},

2475

],

2476

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2477

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2478

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2479

# by the metadata values provided here. Populated for ListJobs and all GetJob

2480

# views SUMMARY and higher.

2481

# ListJob response and Job SUMMARY view.

2482

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2483

"sdkSupportStatus": "A String", # The support status for this SDK version.

2484

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2485

"version": "A String", # The version of the SDK used to run the job.

2486

},

2487

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2488

{ # Metadata for a BigTable connector used by the job.

2489

"instanceId": "A String", # InstanceId accessed in the connection.

2490

"tableId": "A String", # TableId accessed in the connection.

2491

"projectId": "A String", # ProjectId accessed in the connection.

2492

},

2493

],

2494

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2495

{ # Metadata for a PubSub connector used by the job.

2496

"subscription": "A String", # Subscription used in the connection.

2497

"topic": "A String", # Topic accessed in the connection.

2498

},

2499

],

2500

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2501

{ # Metadata for a BigQuery connector used by the job.

2502

"dataset": "A String", # Dataset accessed in the connection.

2503

"projectId": "A String", # Project accessed in the connection.

2504

"query": "A String", # Query used to access data in the connection.

2505

"table": "A String", # Table accessed in the connection.

2506

},

2507

],

2508

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2509

{ # Metadata for a File connector used by the job.

2510

"filePattern": "A String", # File Pattern used to access files by the connector.

2511

},

2512

],

2513

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2514

{ # Metadata for a Datastore connector used by the job.

2515

"namespace": "A String", # Namespace used in the connection.

2516

"projectId": "A String", # ProjectId accessed in the connection.

2517

},

2518

],

2519

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2520

{ # Metadata for a Spanner connector used by the job.

2521

"instanceId": "A String", # InstanceId accessed in the connection.

2522

"databaseId": "A String", # DatabaseId accessed in the connection.

2523

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

2528

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2529

# contains this job.

2530

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

2531

# corresponding name prefixes of the new job.

2532

"a_key": "A String",

2533

},

2534

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2535

# Flexible resource scheduling jobs are started with some delay after job

2536

# creation, so start_time is unset before start and is updated when the

2537

# job is started by the Cloud Dataflow service. For other jobs, start_time

2538

# always equals to create_time and is immutable and set by the Cloud Dataflow

2539

# service.

2540

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2541

# If this field is set, the service will ensure its uniqueness.

2542

# The request to create a job will fail if the service has knowledge of a

2543

# previously submitted job with the same client's ID and job name.

2544

# The caller may use this field to ensure idempotence of job

2545

# creation across retried attempts to create a job.

2546

# By default, the field is empty and, in that case, the service ignores it.

2547

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2548

# isn't contained in the submitted job.

2549

"stages": { # A mapping from each stage to the information about that stage.

2550

"a_key": { # Contains information about how a particular

2551

# google.dataflow.v1beta3.Step will be executed.

2552

"stepName": [ # The steps associated with the execution stage.

2553

# Note that stages may have several steps, and that a given step

2554

# might be run by more than one stage.

2555

"A String",

2556

],

2557

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2558

},

2559

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2560

"type": "A String", # The type of Cloud Dataflow job.

2561

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2562

# Cloud Dataflow service.

2563

"tempFiles": [ # A set of files the system should be aware of that are used

2564

# for temporary storage. These temporary files will be

2565

# removed on job completion.

2566

# No duplicates are allowed.

2567

# No file patterns are supported.

2568

#

2569

# The supported files are:

2570

#

2571

# Google Cloud Storage:

2572

#

2573

# storage.googleapis.com/{bucket}/{object}

2574

# bucket.storage.googleapis.com/{object}

2575

"A String",

2576

],

2577

"id": "A String", # The unique ID of this job.

2578

#

2579

# This field is set by the Cloud Dataflow service when the Job is

2580

# created, and is immutable for the life of the job.

2581

"requestedState": "A String", # The job's requested state.

2582

#

2583

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2584

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2585

# also be used to directly set a job's requested state to

2586

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2587

# job if it has not already reached a terminal state.

2588

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2589

# of the job it replaced.

2590

#

2591

# When sending a `CreateJobRequest`, you can update a job by specifying it

2592

# here. The job named here is stopped, and its intermediate state is

2593

# transferred to this job.

2594

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2595

# snapshot.

2596

"currentState": "A String", # The current state of the job.

2597

#

2598

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2599

# specified.

2600

#

2601

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2602

# terminal state. After a job has reached a terminal state, no

2603

# further state updates may be made.

2604

#

2605

# This field may be mutated by the Cloud Dataflow service;

2606

# callers cannot mutate it.

2607

"name": "A String", # The user-specified Cloud Dataflow job name.

2608

#

2609

# Only one Job with a given name may exist in a project at any

2610

# given time. If a caller attempts to create a Job with the same

2611

# name as an already-existing Job, the attempt returns the

2612

# existing Job.

2613

#

2614

# The name must match the regular expression

2615

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

2616

"currentStateTime": "A String", # The timestamp associated with the current state.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2617

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2618

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2619

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2620

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

2621

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2622

# failed to respond.

2623

{ # Indicates which [regional endpoint]

2624

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

2625

# to respond to a request for data.

2626

"name": "A String", # The name of the [regional endpoint]

2627

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2628

# failed to respond.

2629

},

2630

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

2636

<pre>Retrieves the next page of results.

2637

2638

Args:

2639

previous_request: The request for the previous page. (required)

2640

previous_response: The response from the request for the previous page. (required)

2641

2642

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2643

A request object that you can call 'execute()' on to request the next

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2644

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2649

<code class="details" id="snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2650

<pre>Snapshot the state of a streaming job.

2651

2652

Args:

2653

projectId: string, The project which owns the job to be snapshotted. (required)

2654

location: string, The location that contains this job. (required)

2655

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2656

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2657

The object takes the form of:

2658

2659

{ # Request to create a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2660

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2661

"location": "A String", # The location that contains this job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2662

"description": "A String", # User specified description of the snapshot. Maybe empty.

2663

"ttl": "A String", # TTL for the snapshot.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2664

}

2665

2666

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2673

2674

{ # Represents a snapshot of a job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2675

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

2676

"state": "A String", # State of the snapshot.

2677

"id": "A String", # The unique ID of this snapshot.

2678

"sourceJobId": "A String", # The job this snapshot was created from.

2679

"creationTime": "A String", # The time this snapshot was created.

2680

"description": "A String", # User specified description of the snapshot. Maybe empty.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2681

"pubsubMetadata": [ # PubSub snapshot metadata.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2682

{ # Represents a Pubsub snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2683

"snapshotName": "A String", # The name of the Pubsub snapshot.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2684

"expireTime": "A String", # The expire time of the Pubsub snapshot.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2685

"topicName": "A String", # The name of the Pubsub topic.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2686

},

2687

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2688

"projectId": "A String", # The project this snapshot belongs to.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2689

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

2690

# state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2695

<code class="details" id="update">update(projectId, location, jobId, body=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2696

<pre>Updates the state of an existing Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2697

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2698

To update the state of an existing job, we recommend using

2699

`projects.locations.jobs.update` with a [regional endpoint]

2700

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2701

`projects.jobs.update` is not recommended, as you can only update the state

2702

of jobs that are running in `us-central1`.

2703

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2704

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2705

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2706

location: string, The [regional endpoint]

2707

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2708

contains this job. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2709

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2710

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2711

The object takes the form of:

2712

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2713

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2714

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2715

# A description of the user pipeline and stages through which it is executed.

2716

# Created by Cloud Dataflow service. Only retrieved with

2717

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2718

# form. This data is provided by the Dataflow service for ease of visualizing

2719

# the pipeline and interpreting Dataflow provided metrics.

2720

"displayData": [ # Pipeline level display data.

2721

{ # Data provided with a pipeline or transform to provide descriptive info.

2722

"url": "A String", # An optional full URL.

2723

"javaClassValue": "A String", # Contains value if the data is of java class type.

2724

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2725

"durationValue": "A String", # Contains value if the data is of duration type.

2726

"label": "A String", # An optional label to display in a dax UI for the element.

2727

"key": "A String", # The key identifying the display data.

2728

# This is intended to be used as a label for the display data

2729

# when viewed in a dax monitoring system.

2730

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2731

# language namespace (i.e. python module) which defines the display data.

2732

# This allows a dax monitoring system to specially handle the data

2733

# and perform custom rendering.

2734

"floatValue": 3.14, # Contains value if the data is of float type.

2735

"strValue": "A String", # Contains value if the data is of string type.

2736

"int64Value": "A String", # Contains value if the data is of int64 type.

2737

"boolValue": True or False, # Contains value if the data is of a boolean type.

2738

"shortStrValue": "A String", # A possible additional shorter value to display.

2739

# For example a java_class_name_value of com.mypackage.MyDoFn

2740

# will be stored with MyDoFn as the short_str_value and

2741

# com.mypackage.MyDoFn as the java_class_name value.

2742

# short_str_value can be displayed and java_class_name_value

2743

# will be displayed as a tooltip.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2744

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2745

],

2746

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2747

{ # Description of the type, names/ids, and input/outputs for a transform.

2748

"outputCollectionName": [ # User names for all collection outputs to this transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2749

"A String",

2750

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2751

"displayData": [ # Transform-specific display data.

2752

{ # Data provided with a pipeline or transform to provide descriptive info.

2753

"url": "A String", # An optional full URL.

2754

"javaClassValue": "A String", # Contains value if the data is of java class type.

2755

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2756

"durationValue": "A String", # Contains value if the data is of duration type.

2757

"label": "A String", # An optional label to display in a dax UI for the element.

2758

"key": "A String", # The key identifying the display data.

2759

# This is intended to be used as a label for the display data

2760

# when viewed in a dax monitoring system.

2761

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2762

# language namespace (i.e. python module) which defines the display data.

2763

# This allows a dax monitoring system to specially handle the data

2764

# and perform custom rendering.

2765

"floatValue": 3.14, # Contains value if the data is of float type.

2766

"strValue": "A String", # Contains value if the data is of string type.

2767

"int64Value": "A String", # Contains value if the data is of int64 type.

2768

"boolValue": True or False, # Contains value if the data is of a boolean type.

2769

"shortStrValue": "A String", # A possible additional shorter value to display.

2770

# For example a java_class_name_value of com.mypackage.MyDoFn

2771

# will be stored with MyDoFn as the short_str_value and

2772

# com.mypackage.MyDoFn as the java_class_name value.

2773

# short_str_value can be displayed and java_class_name_value

2774

# will be displayed as a tooltip.

2775

},

2776

],

2777

"id": "A String", # SDK generated id of this transform instance.

2778

"inputCollectionName": [ # User names for all collection inputs to this transform.

2779

"A String",

2780

],

2781

"name": "A String", # User provided name for this transform instance.

2782

"kind": "A String", # Type of transform.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2783

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2784

],

2785

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2786

{ # Description of the composing transforms, names/ids, and input/outputs of a

2787

# stage of execution. Some composing transforms and sources may have been

2788

# generated by the Dataflow service during execution planning.

2789

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2790

{ # Description of an interstitial value between transforms in an execution

2791

# stage.

2792

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2793

"name": "A String", # Dataflow service generated name for this source.

2794

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2795

# source is most closely associated.

2796

},

2797

],

2798

"inputSource": [ # Input sources for this stage.

2799

{ # Description of an input or output of an execution stage.

2800

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2801

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2802

# source is most closely associated.

2803

"sizeBytes": "A String", # Size of the source, if measurable.

2804

"name": "A String", # Dataflow service generated name for this source.

2805

},

2806

],

2807

"name": "A String", # Dataflow service generated name for this stage.

2808

"componentTransform": [ # Transforms that comprise this execution stage.

2809

{ # Description of a transform executed as part of an execution stage.

2810

"name": "A String", # Dataflow service generated name for this source.

2811

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2812

"originalTransform": "A String", # User name for the original user transform with which this transform is

2813

# most closely associated.

2814

},

2815

],

2816

"id": "A String", # Dataflow service generated id for this stage.

2817

"outputSource": [ # Output sources for this stage.

2818

{ # Description of an input or output of an execution stage.

2819

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2820

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2821

# source is most closely associated.

2822

"sizeBytes": "A String", # Size of the source, if measurable.

2823

"name": "A String", # Dataflow service generated name for this source.

2824

},

2825

],

2826

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2827

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2828

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2829

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2830

"labels": { # User-defined labels for this job.

2831

#

2832

# The labels map can contain no more than 64 entries. Entries of the labels

2833

# map are UTF8 strings that comply with the following restrictions:

2834

#

2835

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2836

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2837

# * Both keys and values are additionally constrained to be <= 128 bytes in

2838

# size.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2839

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2840

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2841

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2842

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2843

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2844

"workerRegion": "A String", # The Compute Engine region

2845

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2846

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2847

# with worker_zone. If neither worker_region nor worker_zone is specified,

2848

# default to the control plane's region.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2849

"userAgent": { # A description of the process that generated the request.

2850

"a_key": "", # Properties of the object.

2851

},

2852

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

2853

"version": { # A structure describing which components and their versions of the service

2854

# are required in order to run the job.

2855

"a_key": "", # Properties of the object.

2856

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2857

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2858

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2859

#

2860

# Format:

2861

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2862

"experiments": [ # The list of experiments to enable.

2863

"A String",

2864

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2865

"workerZone": "A String", # The Compute Engine zone

2866

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2867

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2868

# with worker_region. If neither worker_region nor worker_zone is specified,

2869

# a zone in the control plane's region is chosen based on available capacity.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2870

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2871

# specified in order for the job to have workers.

2872

{ # Describes one particular pool of Cloud Dataflow workers to be

2873

# instantiated by the Cloud Dataflow service in order to perform the

2874

# computations required by a job. Note that a workflow job may use

2875

# multiple pools, in order to match the various computational

2876

# requirements of the various stages of the job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2877

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2878

# Compute Engine API.

2879

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2880

# only be set in the Fn API path. For non-cross-language pipelines this

2881

# should have only one entry. Cross-language pipelines will have two or more

2882

# entries.

2883

{ # Defines a SDK harness container for executing Dataflow pipelines.

2884

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2885

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2886

# container instance with this image. If false (or unset) recommends using

2887

# more than one core per SDK container instance with this image for

2888

# efficiency. Note that Dataflow service may choose to override this property

2889

# if needed.

2890

},

2891

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2892

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2893

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2894

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2895

# are supported.

2896

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2897

"a_key": "A String",

2898

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2899

"diskSourceImage": "A String", # Fully qualified source image for disks.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2900

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2901

{ # Describes the data disk used by a workflow job.

2902

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2903

# attempt to choose a reasonable default.

2904

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2905

# must be a disk type appropriate to the project and zone in which

2906

# the workers will run. If unknown or unspecified, the service

2907

# will attempt to choose a reasonable default.

2908

#

2909

# For example, the standard persistent disk type is a resource name

2910

# typically ending in "pd-standard". If SSD persistent disks are

2911

# available, the resource name typically ends with "pd-ssd". The

2912

# actual valid values are defined the Google Compute Engine API,

2913

# not by the Cloud Dataflow API; consult the Google Compute Engine

2914

# documentation for more information about determining the set of

2915

# available disk types for a particular project and zone.

2916

#

2917

# Google Compute Engine Disk types are local to a particular

2918

# project in a particular zone, and so the resource name will

2919

# typically look something like this:

2920

#

2921

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2922

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2923

},

2924

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2925

"packages": [ # Packages to be installed on workers.

2926

{ # The packages that must be installed in order for a worker to run the

2927

# steps of the Cloud Dataflow job that will be assigned to its worker

2928

# pool.

2929

#

2930

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2931

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2932

# might use this to install jars containing the user's code and all of the

2933

# various dependencies (libraries, data files, etc.) required in order

2934

# for that code to run.

2935

"name": "A String", # The name of the package.

2936

"location": "A String", # The resource to read the package from. The supported resource type is:

2937

#

2938

# Google Cloud Storage:

2939

#

2940

# storage.googleapis.com/{bucket}

2941

# bucket.storage.googleapis.com/

2942

},

2943

],

2944

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2945

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2946

# `TEARDOWN_NEVER`.

2947

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2948

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2949

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2950

# down.

2951

#

2952

# If the workers are not torn down by the service, they will

2953

# continue to run and use Google Compute Engine VM resources in the

2954

# user's project until they are explicitly terminated by the user.

2955

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2956

# policy except for small, manually supervised test jobs.

2957

#

2958

# If unknown or unspecified, the service will attempt to choose a reasonable

2959

# default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2960

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2961

# the service will use the network "default".

2962

"ipConfiguration": "A String", # Configuration for VM IPs.

2963

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2964

# attempt to choose a reasonable default.

2965

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2966

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2967

"algorithm": "A String", # The algorithm to use for autoscaling.

2968

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2969

"poolArgs": { # Extra arguments for this worker pool.

2970

"a_key": "", # Properties of the object. Contains field @type with type URL.

2971

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2972

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2973

# the form "regions/REGION/subnetworks/SUBNETWORK".

2974

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2975

# execute the job. If zero or unspecified, the service will

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2976

# attempt to choose a reasonable default.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2977

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2978

# service will choose a number of threads (according to the number of cores

2979

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2980

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2981

# harness, residing in Google Container Registry.

2982

#

2983

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2984

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2985

# using the standard Dataflow task runner. Users should ignore

2986

# this field.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

2987

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2988

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2989

# access the Cloud Dataflow API.

2990

"A String",

2991

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2992

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2993

#

2994

# When workers access Google Cloud APIs, they logically do so via

2995

# relative URLs. If this field is specified, it supplies the base

2996

# URL to use for resolving these relative URLs. The normative

2997

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2998

# Locators".

2999

#

3000

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3001

"workflowFileName": "A String", # The file to store the workflow in.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3002

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3003

# console.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3004

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3005

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3006

# taskrunner; e.g. "root".

3007

"vmId": "A String", # The ID string of the VM.

3008

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3009

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3010

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3011

# "shuffle/v1beta1".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3012

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3013

# storage.

3014

#

3015

# The supported resource type is:

3016

#

3017

# Google Cloud Storage:

3018

#

3019

# storage.googleapis.com/{bucket}/{object}

3020

# bucket.storage.googleapis.com/{object}

3021

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3022

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3023

# "dataflow/v1b3/projects".

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3024

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3025

#

3026

# When workers access Google Cloud APIs, they logically do so via

3027

# relative URLs. If this field is specified, it supplies the base

3028

# URL to use for resolving these relative URLs. The normative

3029

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3030

# Locators".

3031

#

3032

# If not specified, the default value is "http://www.googleapis.com/"

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3033

"workerId": "A String", # The ID of the worker running this pipeline.

3034

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3035

"harnessCommand": "A String", # The command to launch the worker harness.

3036

"logDir": "A String", # The directory on the VM to store logs.

3037

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3038

"languageHint": "A String", # The suggested backend language.

3039

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3040

# taskrunner; e.g. "wheel".

3041

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3042

# will not be uploaded.

3043

#

3044

# The supported resource type is:

3045

#

3046

# Google Cloud Storage:

3047

# storage.googleapis.com/{bucket}/{object}

3048

# bucket.storage.googleapis.com/{object}

3049

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3050

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3051

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3052

# temporary storage.

3053

#

3054

# The supported resource type is:

3055

#

3056

# Google Cloud Storage:

3057

# storage.googleapis.com/{bucket}/{object}

3058

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3059

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3060

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

3061

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3062

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3063

# select a default set of packages which are useful to worker

3064

# harnesses written in a particular language.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3065

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3066

# service will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3067

},

3068

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3069

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3070

# storage. The system will append the suffix "/temp-{JOBNAME} to

3071

# this resource prefix, where {JOBNAME} is the value of the

3072

# job_name field. The resulting bucket and object prefix is used

3073

# as the prefix of the resources used to store temporary data

3074

# needed during the job execution. NOTE: This will override the

3075

# value in taskrunner_settings.

3076

# The supported resource type is:

3077

#

3078

# Google Cloud Storage:

3079

#

3080

# storage.googleapis.com/{bucket}/{object}

3081

# bucket.storage.googleapis.com/{object}

3082

"internalExperiments": { # Experimental settings.

3083

"a_key": "", # Properties of the object. Contains field @type with type URL.

3084

},

3085

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3086

# options are passed through the service and are used to recreate the

3087

# SDK pipeline options on the worker in a language agnostic and platform

3088

# independent way.

3089

"a_key": "", # Properties of the object.

3090

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3091

"dataset": "A String", # The dataset for the current project where various workflow

3092

# related tables are stored.

3093

#

3094

# The supported resource type is:

3095

#

3096

# Google BigQuery:

3097

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3098

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3099

# unspecified, the service will attempt to choose a reasonable

3100

# default. This should be in the form of the API service name,

3101

# e.g. "compute.googleapis.com".

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3102

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3103

"stepsLocation": "A String", # The GCS location where the steps are stored.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3104

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3105

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3106

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3107

{ # Defines a particular step within a Cloud Dataflow job.

3108

#

3109

# A job consists of multiple steps, each of which performs some

3110

# specific operation as part of the overall job. Data is typically

3111

# passed from one step to another as part of the job.

3112

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3113

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3114

# Map-Reduce job:

3115

#

3116

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3117

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3118

#

3119

# * Validate the elements.

3120

#

3121

# * Apply a user-defined function to map each element to some value

3122

# and extract an element-specific key value.

3123

#

3124

# * Group elements with the same key into a single element with

3125

# that key, transforming a multiply-keyed collection into a

3126

# uniquely-keyed collection.

3127

#

3128

# * Write the elements out to some data sink.

3129

#

3130

# Note that the Cloud Dataflow service may be used to run many different

3131

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3132

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3133

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3134

# predefined step has its own required set of properties.

3135

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3136

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3137

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3138

"name": "A String", # The name that identifies the step. This must be unique for each

3139

# step with respect to all other steps in the Cloud Dataflow job.

3140

},

3141

],

3142

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3143

# callers cannot mutate it.

3144

{ # A message describing the state of a particular execution stage.

3145

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3146

"executionStageName": "A String", # The name of the execution stage.

3147

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3148

},

3149

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3150

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3151

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3152

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3153

# by the metadata values provided here. Populated for ListJobs and all GetJob

3154

# views SUMMARY and higher.

3155

# ListJob response and Job SUMMARY view.

3156

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3157

"sdkSupportStatus": "A String", # The support status for this SDK version.

3158

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3159

"version": "A String", # The version of the SDK used to run the job.

3160

},

3161

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3162

{ # Metadata for a BigTable connector used by the job.

3163

"instanceId": "A String", # InstanceId accessed in the connection.

3164

"tableId": "A String", # TableId accessed in the connection.

3165

"projectId": "A String", # ProjectId accessed in the connection.

3166

},

3167

],

3168

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3169

{ # Metadata for a PubSub connector used by the job.

3170

"subscription": "A String", # Subscription used in the connection.

3171

"topic": "A String", # Topic accessed in the connection.

3172

},

3173

],

3174

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3175

{ # Metadata for a BigQuery connector used by the job.

3176

"dataset": "A String", # Dataset accessed in the connection.

3177

"projectId": "A String", # Project accessed in the connection.

3178

"query": "A String", # Query used to access data in the connection.

3179

"table": "A String", # Table accessed in the connection.

3180

},

3181

],

3182

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3183

{ # Metadata for a File connector used by the job.

3184

"filePattern": "A String", # File Pattern used to access files by the connector.

3185

},

3186

],

3187

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3188

{ # Metadata for a Datastore connector used by the job.

3189

"namespace": "A String", # Namespace used in the connection.

3190

"projectId": "A String", # ProjectId accessed in the connection.

3191

},

3192

],

3193

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3194

{ # Metadata for a Spanner connector used by the job.

3195

"instanceId": "A String", # InstanceId accessed in the connection.

3196

"databaseId": "A String", # DatabaseId accessed in the connection.

3197

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

3202

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3203

# contains this job.

3204

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

3205

# corresponding name prefixes of the new job.

3206

"a_key": "A String",

3207

},

3208

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3209

# Flexible resource scheduling jobs are started with some delay after job

3210

# creation, so start_time is unset before start and is updated when the

3211

# job is started by the Cloud Dataflow service. For other jobs, start_time

3212

# always equals to create_time and is immutable and set by the Cloud Dataflow

3213

# service.

3214

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3215

# If this field is set, the service will ensure its uniqueness.

3216

# The request to create a job will fail if the service has knowledge of a

3217

# previously submitted job with the same client's ID and job name.

3218

# The caller may use this field to ensure idempotence of job

3219

# creation across retried attempts to create a job.

3220

# By default, the field is empty and, in that case, the service ignores it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3221

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3222

# isn't contained in the submitted job.

3223

"stages": { # A mapping from each stage to the information about that stage.

3224

"a_key": { # Contains information about how a particular

3225

# google.dataflow.v1beta3.Step will be executed.

3226

"stepName": [ # The steps associated with the execution stage.

3227

# Note that stages may have several steps, and that a given step

3228

# might be run by more than one stage.

"A String",

],

},

},

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3234

"type": "A String", # The type of Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3235

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3236

# Cloud Dataflow service.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3237

"tempFiles": [ # A set of files the system should be aware of that are used

3238

# for temporary storage. These temporary files will be

3239

# removed on job completion.

3240

# No duplicates are allowed.

3241

# No file patterns are supported.

3242

#

3243

# The supported files are:

3244

#

3245

# Google Cloud Storage:

3246

#

3247

# storage.googleapis.com/{bucket}/{object}

3248

# bucket.storage.googleapis.com/{object}

3249

"A String",

3250

],

3251

"id": "A String", # The unique ID of this job.

3252

#

3253

# This field is set by the Cloud Dataflow service when the Job is

3254

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3255

"requestedState": "A String", # The job's requested state.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3256

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3257

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3258

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3259

# also be used to directly set a job's requested state to

3260

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3261

# job if it has not already reached a terminal state.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame]

3262

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3263

# of the job it replaced.

3264

#

3265

# When sending a `CreateJobRequest`, you can update a job by specifying it

3266

# here. The job named here is stopped, and its intermediate state is

3267

# transferred to this job.

3268

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3269

# snapshot.

3270

"currentState": "A String", # The current state of the job.

3271

#

3272

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3273

# specified.

3274

#

3275

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3276

# terminal state. After a job has reached a terminal state, no

3277

# further state updates may be made.

3278

#

3279

# This field may be mutated by the Cloud Dataflow service;

3280

# callers cannot mutate it.

3281

"name": "A String", # The user-specified Cloud Dataflow job name.

3282

#

3283

# Only one Job with a given name may exist in a project at any

3284

# given time. If a caller attempts to create a Job with the same

3285

# name as an already-existing Job, the attempt returns the

3286

# existing Job.

3287

#

3288

# The name must match the regular expression

3289

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3290

"currentStateTime": "A String", # The timestamp associated with the current state.

3291

}

3292

3293

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3300

3301

{ # Defines a job to be run by the Cloud Dataflow service.

3302

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3303

# A description of the user pipeline and stages through which it is executed.

3304

# Created by Cloud Dataflow service. Only retrieved with

3305

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3306

# form. This data is provided by the Dataflow service for ease of visualizing

3307

# the pipeline and interpreting Dataflow provided metrics.

3308

"displayData": [ # Pipeline level display data.

3309

{ # Data provided with a pipeline or transform to provide descriptive info.

3310

"url": "A String", # An optional full URL.

3311

"javaClassValue": "A String", # Contains value if the data is of java class type.

3312

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3313

"durationValue": "A String", # Contains value if the data is of duration type.

3314

"label": "A String", # An optional label to display in a dax UI for the element.

3315

"key": "A String", # The key identifying the display data.

3316

# This is intended to be used as a label for the display data

3317

# when viewed in a dax monitoring system.

3318

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3319

# language namespace (i.e. python module) which defines the display data.

3320

# This allows a dax monitoring system to specially handle the data

3321

# and perform custom rendering.

3322

"floatValue": 3.14, # Contains value if the data is of float type.

3323

"strValue": "A String", # Contains value if the data is of string type.

3324

"int64Value": "A String", # Contains value if the data is of int64 type.

3325

"boolValue": True or False, # Contains value if the data is of a boolean type.

3326

"shortStrValue": "A String", # A possible additional shorter value to display.

3327

# For example a java_class_name_value of com.mypackage.MyDoFn

3328

# will be stored with MyDoFn as the short_str_value and

3329

# com.mypackage.MyDoFn as the java_class_name value.

3330

# short_str_value can be displayed and java_class_name_value

3331

# will be displayed as a tooltip.

3332

},

3333

],

3334

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3335

{ # Description of the type, names/ids, and input/outputs for a transform.

3336

"outputCollectionName": [ # User names for all collection outputs to this transform.

3337

"A String",

3338

],

3339

"displayData": [ # Transform-specific display data.

3340

{ # Data provided with a pipeline or transform to provide descriptive info.

3341

"url": "A String", # An optional full URL.

3342

"javaClassValue": "A String", # Contains value if the data is of java class type.

3343

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3344

"durationValue": "A String", # Contains value if the data is of duration type.

3345

"label": "A String", # An optional label to display in a dax UI for the element.

3346

"key": "A String", # The key identifying the display data.

3347

# This is intended to be used as a label for the display data

3348

# when viewed in a dax monitoring system.

3349

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3350

# language namespace (i.e. python module) which defines the display data.

3351

# This allows a dax monitoring system to specially handle the data

3352

# and perform custom rendering.

3353

"floatValue": 3.14, # Contains value if the data is of float type.

3354

"strValue": "A String", # Contains value if the data is of string type.

3355

"int64Value": "A String", # Contains value if the data is of int64 type.

3356

"boolValue": True or False, # Contains value if the data is of a boolean type.

3357

"shortStrValue": "A String", # A possible additional shorter value to display.

3358

# For example a java_class_name_value of com.mypackage.MyDoFn

3359

# will be stored with MyDoFn as the short_str_value and

3360

# com.mypackage.MyDoFn as the java_class_name value.

3361

# short_str_value can be displayed and java_class_name_value

3362

# will be displayed as a tooltip.

3363

},

3364

],

3365

"id": "A String", # SDK generated id of this transform instance.

3366

"inputCollectionName": [ # User names for all collection inputs to this transform.

3367

"A String",

3368

],

3369

"name": "A String", # User provided name for this transform instance.

3370

"kind": "A String", # Type of transform.

3371

},

3372

],

3373

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3374

{ # Description of the composing transforms, names/ids, and input/outputs of a

3375

# stage of execution. Some composing transforms and sources may have been

3376

# generated by the Dataflow service during execution planning.

3377

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3378

{ # Description of an interstitial value between transforms in an execution

3379

# stage.

3380

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3381

"name": "A String", # Dataflow service generated name for this source.

3382

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3383

# source is most closely associated.

3384

},

3385

],

3386

"inputSource": [ # Input sources for this stage.

3387

{ # Description of an input or output of an execution stage.

3388

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3389

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3390

# source is most closely associated.

3391

"sizeBytes": "A String", # Size of the source, if measurable.

3392

"name": "A String", # Dataflow service generated name for this source.

3393

},

3394

],

3395

"name": "A String", # Dataflow service generated name for this stage.

3396

"componentTransform": [ # Transforms that comprise this execution stage.

3397

{ # Description of a transform executed as part of an execution stage.

3398

"name": "A String", # Dataflow service generated name for this source.

3399

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3400

"originalTransform": "A String", # User name for the original user transform with which this transform is

3401

# most closely associated.

3402

},

3403

],

3404

"id": "A String", # Dataflow service generated id for this stage.

3405

"outputSource": [ # Output sources for this stage.

3406

{ # Description of an input or output of an execution stage.

3407

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3408

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3409

# source is most closely associated.

3410

"sizeBytes": "A String", # Size of the source, if measurable.

3411

"name": "A String", # Dataflow service generated name for this source.

3412

},

3413

],

3414

"kind": "A String", # Type of tranform this stage is executing.

},

],

},

"labels": { # User-defined labels for this job.

3419

#

3420

# The labels map can contain no more than 64 entries. Entries of the labels

3421

# map are UTF8 strings that comply with the following restrictions:

3422

#

3423

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3424

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3425

# * Both keys and values are additionally constrained to be <= 128 bytes in

# size.

"a_key": "A String",

},

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

3430

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

3431

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3432

"workerRegion": "A String", # The Compute Engine region

3433

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3434

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3435

# with worker_zone. If neither worker_region nor worker_zone is specified,

3436

# default to the control plane's region.

3437

"userAgent": { # A description of the process that generated the request.

3438

"a_key": "", # Properties of the object.

3439

},

3440

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

3441

"version": { # A structure describing which components and their versions of the service

3442

# are required in order to run the job.

3443

"a_key": "", # Properties of the object.

3444

},

3445

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3446

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3447

#

3448

# Format:

3449

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3450

"experiments": [ # The list of experiments to enable.

3451

"A String",

3452

],

3453

"workerZone": "A String", # The Compute Engine zone

3454

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3455

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3456

# with worker_region. If neither worker_region nor worker_zone is specified,

3457

# a zone in the control plane's region is chosen based on available capacity.

3458

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

3459

# specified in order for the job to have workers.

3460

{ # Describes one particular pool of Cloud Dataflow workers to be

3461

# instantiated by the Cloud Dataflow service in order to perform the

3462

# computations required by a job. Note that a workflow job may use

3463

# multiple pools, in order to match the various computational

3464

# requirements of the various stages of the job.

3465

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3466

# Compute Engine API.

3467

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3468

# only be set in the Fn API path. For non-cross-language pipelines this

3469

# should have only one entry. Cross-language pipelines will have two or more

3470

# entries.

3471

{ # Defines a SDK harness container for executing Dataflow pipelines.

3472

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3473

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3474

# container instance with this image. If false (or unset) recommends using

3475

# more than one core per SDK container instance with this image for

3476

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

3481

# will attempt to choose a reasonable default.

3482

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3483

# are supported.

3484

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3485

"a_key": "A String",

3486

},

3487

"diskSourceImage": "A String", # Fully qualified source image for disks.

3488

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3489

{ # Describes the data disk used by a workflow job.

3490

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3491

# attempt to choose a reasonable default.

3492

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3493

# must be a disk type appropriate to the project and zone in which

3494

# the workers will run. If unknown or unspecified, the service

3495

# will attempt to choose a reasonable default.

3496

#

3497

# For example, the standard persistent disk type is a resource name

3498

# typically ending in "pd-standard". If SSD persistent disks are

3499

# available, the resource name typically ends with "pd-ssd". The

3500

# actual valid values are defined the Google Compute Engine API,

3501

# not by the Cloud Dataflow API; consult the Google Compute Engine

3502

# documentation for more information about determining the set of

3503

# available disk types for a particular project and zone.

3504

#

3505

# Google Compute Engine Disk types are local to a particular

3506

# project in a particular zone, and so the resource name will

3507

# typically look something like this:

3508

#

3509

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

3510

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3511

},

3512

],

3513

"packages": [ # Packages to be installed on workers.

3514

{ # The packages that must be installed in order for a worker to run the

3515

# steps of the Cloud Dataflow job that will be assigned to its worker

3516

# pool.

3517

#

3518

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3519

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3520

# might use this to install jars containing the user's code and all of the

3521

# various dependencies (libraries, data files, etc.) required in order

3522

# for that code to run.

3523

"name": "A String", # The name of the package.

3524

"location": "A String", # The resource to read the package from. The supported resource type is:

3525

#

3526

# Google Cloud Storage:

3527

#

3528

# storage.googleapis.com/{bucket}

3529

# bucket.storage.googleapis.com/

3530

},

3531

],

3532

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3533

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3534

# `TEARDOWN_NEVER`.

3535

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3536

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3537

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3538

# down.

3539

#

3540

# If the workers are not torn down by the service, they will

3541

# continue to run and use Google Compute Engine VM resources in the

3542

# user's project until they are explicitly terminated by the user.

3543

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3544

# policy except for small, manually supervised test jobs.

3545

#

3546

# If unknown or unspecified, the service will attempt to choose a reasonable

3547

# default.

3548

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3549

# the service will use the network "default".

3550

"ipConfiguration": "A String", # Configuration for VM IPs.

3551

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3552

# attempt to choose a reasonable default.

3553

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3554

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3555

"algorithm": "A String", # The algorithm to use for autoscaling.

3556

},

3557

"poolArgs": { # Extra arguments for this worker pool.

3558

"a_key": "", # Properties of the object. Contains field @type with type URL.

3559

},

3560

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3561

# the form "regions/REGION/subnetworks/SUBNETWORK".

3562

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3563

# execute the job. If zero or unspecified, the service will

3564

# attempt to choose a reasonable default.

3565

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3566

# service will choose a number of threads (according to the number of cores

3567

# on the selected machine type for batch, or 1 by convention for streaming).

3568

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3569

# harness, residing in Google Container Registry.

3570

#

3571

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3572

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3573

# using the standard Dataflow task runner. Users should ignore

3574

# this field.

3575

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3576

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3577

# access the Cloud Dataflow API.

3578

"A String",

3579

],

3580

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3581

#

3582

# When workers access Google Cloud APIs, they logically do so via

3583

# relative URLs. If this field is specified, it supplies the base

3584

# URL to use for resolving these relative URLs. The normative

3585

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3586

# Locators".

3587

#

3588

# If not specified, the default value is "http://www.googleapis.com/"

3589

"workflowFileName": "A String", # The file to store the workflow in.

3590

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3591

# console.

3592

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3593

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3594

# taskrunner; e.g. "root".

3595

"vmId": "A String", # The ID string of the VM.

3596

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3597

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3598

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3599

# "shuffle/v1beta1".

3600

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3601

# storage.

3602

#

3603

# The supported resource type is:

3604

#

3605

# Google Cloud Storage:

3606

#

3607

# storage.googleapis.com/{bucket}/{object}

3608

# bucket.storage.googleapis.com/{object}

3609

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3610

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3611

# "dataflow/v1b3/projects".

3612

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3613

#

3614

# When workers access Google Cloud APIs, they logically do so via

3615

# relative URLs. If this field is specified, it supplies the base

3616

# URL to use for resolving these relative URLs. The normative

3617

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3618

# Locators".

3619

#

3620

# If not specified, the default value is "http://www.googleapis.com/"

3621

"workerId": "A String", # The ID of the worker running this pipeline.

3622

},

3623

"harnessCommand": "A String", # The command to launch the worker harness.

3624

"logDir": "A String", # The directory on the VM to store logs.

3625

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3626

"languageHint": "A String", # The suggested backend language.

3627

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3628

# taskrunner; e.g. "wheel".

3629

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3630

# will not be uploaded.

3631

#

3632

# The supported resource type is:

3633

#

3634

# Google Cloud Storage:

3635

# storage.googleapis.com/{bucket}/{object}

3636

# bucket.storage.googleapis.com/{object}

3637

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3638

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3639

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3640

# temporary storage.

3641

#

3642

# The supported resource type is:

3643

#

3644

# Google Cloud Storage:

3645

# storage.googleapis.com/{bucket}/{object}

3646

# bucket.storage.googleapis.com/{object}

3647

},

3648

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

3649

# attempt to choose a reasonable default.

3650

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3651

# select a default set of packages which are useful to worker

3652

# harnesses written in a particular language.

3653

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3654

# service will attempt to choose a reasonable default.

3655

},

3656

],

3657

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3658

# storage. The system will append the suffix "/temp-{JOBNAME} to

3659

# this resource prefix, where {JOBNAME} is the value of the

3660

# job_name field. The resulting bucket and object prefix is used

3661

# as the prefix of the resources used to store temporary data

3662

# needed during the job execution. NOTE: This will override the

3663

# value in taskrunner_settings.

3664

# The supported resource type is:

3665

#

3666

# Google Cloud Storage:

3667

#

3668

# storage.googleapis.com/{bucket}/{object}

3669

# bucket.storage.googleapis.com/{object}

3670

"internalExperiments": { # Experimental settings.

3671

"a_key": "", # Properties of the object. Contains field @type with type URL.

3672

},

3673

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3674

# options are passed through the service and are used to recreate the

3675

# SDK pipeline options on the worker in a language agnostic and platform

3676

# independent way.

3677

"a_key": "", # Properties of the object.

3678

},

3679

"dataset": "A String", # The dataset for the current project where various workflow

3680

# related tables are stored.

3681

#

3682

# The supported resource type is:

3683

#

3684

# Google BigQuery:

3685

# bigquery.googleapis.com/{dataset}

3686

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

3687

# unspecified, the service will attempt to choose a reasonable

3688

# default. This should be in the form of the API service name,

3689

# e.g. "compute.googleapis.com".

3690

},

3691

"stepsLocation": "A String", # The GCS location where the steps are stored.

3692

"steps": [ # Exactly one of step or steps_location should be specified.

3693

#

3694

# The top-level steps that constitute the entire job.

3695

{ # Defines a particular step within a Cloud Dataflow job.

3696

#

3697

# A job consists of multiple steps, each of which performs some

3698

# specific operation as part of the overall job. Data is typically

3699

# passed from one step to another as part of the job.

3700

#

3701

# Here's an example of a sequence of steps which together implement a

3702

# Map-Reduce job:

3703

#

3704

# * Read a collection of data from some source, parsing the

3705

# collection's elements.

3706

#

3707

# * Validate the elements.

3708

#

3709

# * Apply a user-defined function to map each element to some value

3710

# and extract an element-specific key value.

3711

#

3712

# * Group elements with the same key into a single element with

3713

# that key, transforming a multiply-keyed collection into a

3714

# uniquely-keyed collection.

3715

#

3716

# * Write the elements out to some data sink.

3717

#

3718

# Note that the Cloud Dataflow service may be used to run many different

3719

# types of jobs, not just Map-Reduce.

3720

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3721

"properties": { # Named properties associated with the step. Each kind of

3722

# predefined step has its own required set of properties.

3723

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

3724

"a_key": "", # Properties of the object.

3725

},

3726

"name": "A String", # The name that identifies the step. This must be unique for each

3727

# step with respect to all other steps in the Cloud Dataflow job.

3728

},

3729

],

3730

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3731

# callers cannot mutate it.

3732

{ # A message describing the state of a particular execution stage.

3733

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3734

"executionStageName": "A String", # The name of the execution stage.

3735

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3736

},

3737

],

3738

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3739

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3740

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3741

# by the metadata values provided here. Populated for ListJobs and all GetJob

3742

# views SUMMARY and higher.

3743

# ListJob response and Job SUMMARY view.

3744

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3745

"sdkSupportStatus": "A String", # The support status for this SDK version.

3746

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3747

"version": "A String", # The version of the SDK used to run the job.

3748

},

3749

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3750

{ # Metadata for a BigTable connector used by the job.

3751

"instanceId": "A String", # InstanceId accessed in the connection.

3752

"tableId": "A String", # TableId accessed in the connection.

3753

"projectId": "A String", # ProjectId accessed in the connection.

3754

},

3755

],

3756

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3757

{ # Metadata for a PubSub connector used by the job.

3758

"subscription": "A String", # Subscription used in the connection.

3759

"topic": "A String", # Topic accessed in the connection.

3760

},

3761

],

3762

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3763

{ # Metadata for a BigQuery connector used by the job.

3764

"dataset": "A String", # Dataset accessed in the connection.

3765

"projectId": "A String", # Project accessed in the connection.

3766

"query": "A String", # Query used to access data in the connection.

3767

"table": "A String", # Table accessed in the connection.

3768

},

3769

],

3770

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3771

{ # Metadata for a File connector used by the job.

3772

"filePattern": "A String", # File Pattern used to access files by the connector.

3773

},

3774

],

3775

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3776

{ # Metadata for a Datastore connector used by the job.

3777

"namespace": "A String", # Namespace used in the connection.

3778

"projectId": "A String", # ProjectId accessed in the connection.

3779

},

3780

],

3781

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3782

{ # Metadata for a Spanner connector used by the job.

3783

"instanceId": "A String", # InstanceId accessed in the connection.

3784

"databaseId": "A String", # DatabaseId accessed in the connection.

3785

"projectId": "A String", # ProjectId accessed in the connection.

},

],

},

"location": "A String", # The [regional endpoint]

3790

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3791

# contains this job.

3792

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

3793

# corresponding name prefixes of the new job.

3794

"a_key": "A String",

3795

},

3796

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3797

# Flexible resource scheduling jobs are started with some delay after job

3798

# creation, so start_time is unset before start and is updated when the

3799

# job is started by the Cloud Dataflow service. For other jobs, start_time

3800

# always equals to create_time and is immutable and set by the Cloud Dataflow

3801

# service.

3802

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3803

# If this field is set, the service will ensure its uniqueness.

3804

# The request to create a job will fail if the service has knowledge of a

3805

# previously submitted job with the same client's ID and job name.

3806

# The caller may use this field to ensure idempotence of job

3807

# creation across retried attempts to create a job.

3808

# By default, the field is empty and, in that case, the service ignores it.

3809

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3810

# isn't contained in the submitted job.

3811

"stages": { # A mapping from each stage to the information about that stage.

3812

"a_key": { # Contains information about how a particular

3813

# google.dataflow.v1beta3.Step will be executed.

3814

"stepName": [ # The steps associated with the execution stage.

3815

# Note that stages may have several steps, and that a given step

3816

# might be run by more than one stage.

"A String",

],

},

},

},

"type": "A String", # The type of Cloud Dataflow job.

3823

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3824

# Cloud Dataflow service.

3825

"tempFiles": [ # A set of files the system should be aware of that are used

3826

# for temporary storage. These temporary files will be

3827

# removed on job completion.

3828

# No duplicates are allowed.

3829

# No file patterns are supported.

3830

#

3831

# The supported files are:

3832

#

3833

# Google Cloud Storage:

3834

#

3835

# storage.googleapis.com/{bucket}/{object}

3836

# bucket.storage.googleapis.com/{object}

3837

"A String",

3838

],

3839

"id": "A String", # The unique ID of this job.

3840

#

3841

# This field is set by the Cloud Dataflow service when the Job is

3842

# created, and is immutable for the life of the job.

3843

"requestedState": "A String", # The job's requested state.

3844

#

3845

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3846

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3847

# also be used to directly set a job's requested state to

3848

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3849

# job if it has not already reached a terminal state.

3850

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3851

# of the job it replaced.

3852

#

3853

# When sending a `CreateJobRequest`, you can update a job by specifying it

3854

# here. The job named here is stopped, and its intermediate state is

3855

# transferred to this job.

3856

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3857

# snapshot.

3858

"currentState": "A String", # The current state of the job.

3859

#

3860

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3861

# specified.

3862

#

3863

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3864

# terminal state. After a job has reached a terminal state, no

3865

# further state updates may be made.

3866

#

3867

# This field may be mutated by the Cloud Dataflow service;

3868

# callers cannot mutate it.

3869

"name": "A String", # The user-specified Cloud Dataflow job name.

3870

#

3871

# Only one Job with a given name may exist in a project at any

3872

# given time. If a caller attempts to create a Job with the same

3873

# name as an already-existing Job, the attempt returns the

3874

# existing Job.

3875

#

3876

# The name must match the regular expression

3877

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

3878

"currentStateTime": "A String", # The timestamp associated with the current state.

3879

}</pre>

Jon Wayne Parrott