Blame - docs/dyn/dataflow_v1b3.projects.jobs.html - platform/external/python/google-api-python-client

2015-06-15 16:44:50 +0000

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Jon Wayne Parrott

7d5badb

2016-08-16 12:44:29 -0700

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

88

<code><a href="dataflow_v1b3.projects.jobs.workItems.html">workItems()</a></code>

89

</p>

90

<p class="firstline">Returns the workItems Resource.</p>

91

92

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

93

<code><a href="#aggregated">aggregated(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

94

<p class="firstline">List the jobs of a project across all regions.</p>

95

96

<code><a href="#aggregated_next">aggregated_next(previous_request, previous_response)</a></code></p>

97

<p class="firstline">Retrieves the next page of results.</p>

98

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

99

<code><a href="#create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

100

<p class="firstline">Creates a Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

101

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

102

<code><a href="#get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

103

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

104

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

105

<code><a href="#getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</a></code></p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

106

<p class="firstline">Request the job status.</p>

107

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

108

<code><a href="#list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

109

<p class="firstline">List the jobs of a project.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

110

111

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

112

<p class="firstline">Retrieves the next page of results.</p>

113

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

114

<code><a href="#snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

115

<p class="firstline">Snapshot the state of a streaming job.</p>

116

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

117

<code><a href="#update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

118

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

119

<h3>Method Details</h3>

120

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

121

<code class="details" id="aggregated">aggregated(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

122

<pre>List the jobs of a project across all regions.

123

124

Args:

125

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

126

filter: string, The kind of filter to use.

127

location: string, The [regional endpoint]

128

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

129

contains this job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

130

pageToken: string, Set this to the 'next_page_token' field of a previous response

131

to request additional results in a long list.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

132

pageSize: integer, If there are many jobs, limit response to at most this many.

133

The actual number of jobs returned will be the lesser of max_responses

134

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

135

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

136

x__xgafv: string, V1 error format.

137

Allowed values

138

1 - v1 error format

139

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

140

141

Returns:

142

An object of the form:

143

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

144

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

145

# be a partial response, depending on the page size in the ListJobsRequest.

146

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

147

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

148

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

149

"jobs": [ # A subset of the requested job information.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

150

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

151

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

152

# If this field is set, the service will ensure its uniqueness.

153

# The request to create a job will fail if the service has knowledge of a

154

# previously submitted job with the same client's ID and job name.

155

# The caller may use this field to ensure idempotence of job

156

# creation across retried attempts to create a job.

157

# By default, the field is empty and, in that case, the service ignores it.

158

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

159

#

160

# This field is set by the Cloud Dataflow service when the Job is

161

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

162

"currentStateTime": "A String", # The timestamp associated with the current state.

163

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

164

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

165

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

166

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

167

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

168

"internalExperiments": { # Experimental settings.

169

"a_key": "", # Properties of the object. Contains field @type with type URL.

170

},

171

"workerRegion": "A String", # The Compute Engine region

172

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

173

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

174

# with worker_zone. If neither worker_region nor worker_zone is specified,

175

# default to the control plane's region.

176

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

177

# at rest, AKA a Customer Managed Encryption Key (CMEK).

178

#

179

# Format:

180

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

181

"userAgent": { # A description of the process that generated the request.

182

"a_key": "", # Properties of the object.

183

},

184

"workerZone": "A String", # The Compute Engine zone

185

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

186

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

187

# with worker_region. If neither worker_region nor worker_zone is specified,

188

# a zone in the control plane's region is chosen based on available capacity.

189

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

190

# unspecified, the service will attempt to choose a reasonable

191

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

192

# e.g. "compute.googleapis.com".

193

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

194

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

195

# this resource prefix, where {JOBNAME} is the value of the

196

# job_name field. The resulting bucket and object prefix is used

197

# as the prefix of the resources used to store temporary data

198

# needed during the job execution. NOTE: This will override the

199

# value in taskrunner_settings.

200

# The supported resource type is:

201

#

202

# Google Cloud Storage:

203

#

204

# storage.googleapis.com/{bucket}/{object}

205

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

206

"experiments": [ # The list of experiments to enable.

207

"A String",

208

],

209

"version": { # A structure describing which components and their versions of the service

210

# are required in order to run the job.

211

"a_key": "", # Properties of the object.

212

},

213

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

214

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

215

# options are passed through the service and are used to recreate the

216

# SDK pipeline options on the worker in a language agnostic and platform

217

# independent way.

218

"a_key": "", # Properties of the object.

219

},

220

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

221

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

222

# specified in order for the job to have workers.

223

{ # Describes one particular pool of Cloud Dataflow workers to be

224

# instantiated by the Cloud Dataflow service in order to perform the

225

# computations required by a job. Note that a workflow job may use

226

# multiple pools, in order to match the various computational

227

# requirements of the various stages of the job.

228

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

229

# service will choose a number of threads (according to the number of cores

230

# on the selected machine type for batch, or 1 by convention for streaming).

231

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

232

# execute the job. If zero or unspecified, the service will

233

# attempt to choose a reasonable default.

234

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

235

# will attempt to choose a reasonable default.

236

"diskSourceImage": "A String", # Fully qualified source image for disks.

237

"packages": [ # Packages to be installed on workers.

238

{ # The packages that must be installed in order for a worker to run the

239

# steps of the Cloud Dataflow job that will be assigned to its worker

240

# pool.

241

#

242

# This is the mechanism by which the Cloud Dataflow SDK causes code to

243

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

244

# might use this to install jars containing the user's code and all of the

245

# various dependencies (libraries, data files, etc.) required in order

246

# for that code to run.

247

"name": "A String", # The name of the package.

248

"location": "A String", # The resource to read the package from. The supported resource type is:

249

#

250

# Google Cloud Storage:

251

#

252

# storage.googleapis.com/{bucket}

253

# bucket.storage.googleapis.com/

254

},

255

],

256

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

257

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

258

# `TEARDOWN_NEVER`.

259

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

260

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

261

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

262

# down.

263

#

264

# If the workers are not torn down by the service, they will

265

# continue to run and use Google Compute Engine VM resources in the

266

# user's project until they are explicitly terminated by the user.

267

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

268

# policy except for small, manually supervised test jobs.

269

#

270

# If unknown or unspecified, the service will attempt to choose a reasonable

271

# default.

272

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

273

# Compute Engine API.

274

"poolArgs": { # Extra arguments for this worker pool.

275

"a_key": "", # Properties of the object. Contains field @type with type URL.

276

},

277

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

278

# attempt to choose a reasonable default.

279

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

280

# harness, residing in Google Container Registry.

281

#

282

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

283

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

284

# attempt to choose a reasonable default.

285

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

286

# service will attempt to choose a reasonable default.

287

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

288

# are supported.

289

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

290

# only be set in the Fn API path. For non-cross-language pipelines this

291

# should have only one entry. Cross-language pipelines will have two or more

292

# entries.

293

{ # Defines a SDK harness container for executing Dataflow pipelines.

294

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

295

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

296

# container instance with this image. If false (or unset) recommends using

297

# more than one core per SDK container instance with this image for

298

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

303

{ # Describes the data disk used by a workflow job.

304

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

305

# must be a disk type appropriate to the project and zone in which

306

# the workers will run. If unknown or unspecified, the service

307

# will attempt to choose a reasonable default.

308

#

309

# For example, the standard persistent disk type is a resource name

310

# typically ending in "pd-standard". If SSD persistent disks are

311

# available, the resource name typically ends with "pd-ssd". The

312

# actual valid values are defined the Google Compute Engine API,

313

# not by the Cloud Dataflow API; consult the Google Compute Engine

314

# documentation for more information about determining the set of

315

# available disk types for a particular project and zone.

316

#

317

# Google Compute Engine Disk types are local to a particular

318

# project in a particular zone, and so the resource name will

319

# typically look something like this:

320

#

321

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

322

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

323

# attempt to choose a reasonable default.

324

"mountPoint": "A String", # Directory in a VM where disk is mounted.

325

},

326

],

327

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

328

# the form "regions/REGION/subnetworks/SUBNETWORK".

329

"ipConfiguration": "A String", # Configuration for VM IPs.

330

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

331

# using the standard Dataflow task runner. Users should ignore

332

# this field.

333

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

334

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

335

# taskrunner; e.g. "wheel".

336

"harnessCommand": "A String", # The command to launch the worker harness.

337

"logDir": "A String", # The directory on the VM to store logs.

338

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

339

# access the Cloud Dataflow API.

340

"A String",

341

],

342

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

343

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

344

# will not be uploaded.

345

#

346

# The supported resource type is:

347

#

348

# Google Cloud Storage:

349

# storage.googleapis.com/{bucket}/{object}

350

# bucket.storage.googleapis.com/{object}

351

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

352

"workflowFileName": "A String", # The file to store the workflow in.

353

"languageHint": "A String", # The suggested backend language.

354

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

355

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

356

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

357

# temporary storage.

358

#

359

# The supported resource type is:

360

#

361

# Google Cloud Storage:

362

# storage.googleapis.com/{bucket}/{object}

363

# bucket.storage.googleapis.com/{object}

364

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

365

#

366

# When workers access Google Cloud APIs, they logically do so via

367

# relative URLs. If this field is specified, it supplies the base

368

# URL to use for resolving these relative URLs. The normative

369

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

370

# Locators".

371

#

372

# If not specified, the default value is "http://www.googleapis.com/"

373

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

374

# console.

375

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

376

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

377

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

378

# storage.

379

#

380

# The supported resource type is:

381

#

382

# Google Cloud Storage:

383

#

384

# storage.googleapis.com/{bucket}/{object}

385

# bucket.storage.googleapis.com/{object}

386

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

387

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

388

#

389

# When workers access Google Cloud APIs, they logically do so via

390

# relative URLs. If this field is specified, it supplies the base

391

# URL to use for resolving these relative URLs. The normative

392

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

393

# Locators".

394

#

395

# If not specified, the default value is "http://www.googleapis.com/"

396

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

397

# "dataflow/v1b3/projects".

398

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

399

# "shuffle/v1beta1".

400

"workerId": "A String", # The ID of the worker running this pipeline.

401

},

402

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

403

# taskrunner; e.g. "root".

404

"vmId": "A String", # The ID string of the VM.

405

},

406

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

407

"algorithm": "A String", # The algorithm to use for autoscaling.

408

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

409

},

410

"metadata": { # Metadata to set on the Google Compute Engine VMs.

411

"a_key": "A String",

412

},

413

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

414

# select a default set of packages which are useful to worker

415

# harnesses written in a particular language.

416

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

417

# the service will use the network "default".

418

},

419

],

420

"dataset": "A String", # The dataset for the current project where various workflow

421

# related tables are stored.

422

#

423

# The supported resource type is:

424

#

425

# Google BigQuery:

426

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

427

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

428

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

429

# callers cannot mutate it.

430

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

431

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

432

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

433

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

434

},

435

],

436

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

437

# by the metadata values provided here. Populated for ListJobs and all GetJob

438

# views SUMMARY and higher.

439

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

440

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

441

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

442

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

443

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

444

},

445

],

446

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

447

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

448

"sdkSupportStatus": "A String", # The support status for this SDK version.

449

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

450

},

451

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

452

{ # Metadata for a BigQuery connector used by the job.

453

"table": "A String", # Table accessed in the connection.

454

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

455

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

456

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

457

},

458

],

459

"fileDetails": [ # Identification of a File source used in the Dataflow job.

460

{ # Metadata for a File connector used by the job.

461

"filePattern": "A String", # File Pattern used to access files by the connector.

462

},

463

],

464

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

465

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

466

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

467

"subscription": "A String", # Subscription used in the connection.

468

},

469

],

470

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

471

{ # Metadata for a BigTable connector used by the job.

472

"projectId": "A String", # ProjectId accessed in the connection.

473

"instanceId": "A String", # InstanceId accessed in the connection.

474

"tableId": "A String", # TableId accessed in the connection.

475

},

476

],

477

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

478

{ # Metadata for a Spanner connector used by the job.

479

"instanceId": "A String", # InstanceId accessed in the connection.

480

"projectId": "A String", # ProjectId accessed in the connection.

481

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

482

},

483

],

484

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

485

"type": "A String", # The type of Cloud Dataflow job.

486

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

487

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

488

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

489

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

490

# A description of the user pipeline and stages through which it is executed.

491

# Created by Cloud Dataflow service. Only retrieved with

492

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

493

# form. This data is provided by the Dataflow service for ease of visualizing

494

# the pipeline and interpreting Dataflow provided metrics.

495

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

496

{ # Description of the composing transforms, names/ids, and input/outputs of a

497

# stage of execution. Some composing transforms and sources may have been

498

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

499

"outputSource": [ # Output sources for this stage.

500

{ # Description of an input or output of an execution stage.

501

"sizeBytes": "A String", # Size of the source, if measurable.

502

"name": "A String", # Dataflow service generated name for this source.

503

"userName": "A String", # Human-readable name for this source; may be user or system generated.

504

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

505

# source is most closely associated.

506

},

507

],

508

"name": "A String", # Dataflow service generated name for this stage.

509

"inputSource": [ # Input sources for this stage.

510

{ # Description of an input or output of an execution stage.

511

"sizeBytes": "A String", # Size of the source, if measurable.

512

"name": "A String", # Dataflow service generated name for this source.

513

"userName": "A String", # Human-readable name for this source; may be user or system generated.

514

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

515

# source is most closely associated.

516

},

517

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

518

"id": "A String", # Dataflow service generated id for this stage.

519

"componentTransform": [ # Transforms that comprise this execution stage.

520

{ # Description of a transform executed as part of an execution stage.

521

"originalTransform": "A String", # User name for the original user transform with which this transform is

522

# most closely associated.

523

"name": "A String", # Dataflow service generated name for this source.

524

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

525

},

526

],

527

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

528

{ # Description of an interstitial value between transforms in an execution

529

# stage.

530

"name": "A String", # Dataflow service generated name for this source.

531

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

532

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

533

# source is most closely associated.

534

},

535

],

536

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

537

},

538

],

539

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

540

{ # Description of the type, names/ids, and input/outputs for a transform.

541

"kind": "A String", # Type of transform.

542

"inputCollectionName": [ # User names for all collection inputs to this transform.

543

"A String",

544

],

545

"name": "A String", # User provided name for this transform instance.

546

"id": "A String", # SDK generated id of this transform instance.

547

"displayData": [ # Transform-specific display data.

548

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

549

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

550

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

551

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

552

# language namespace (i.e. python module) which defines the display data.

553

# This allows a dax monitoring system to specially handle the data

554

# and perform custom rendering.

555

"floatValue": 3.14, # Contains value if the data is of float type.

556

"key": "A String", # The key identifying the display data.

557

# This is intended to be used as a label for the display data

558

# when viewed in a dax monitoring system.

559

"shortStrValue": "A String", # A possible additional shorter value to display.

560

# For example a java_class_name_value of com.mypackage.MyDoFn

561

# will be stored with MyDoFn as the short_str_value and

562

# com.mypackage.MyDoFn as the java_class_name value.

563

# short_str_value can be displayed and java_class_name_value

564

# will be displayed as a tooltip.

565

"url": "A String", # An optional full URL.

566

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

567

"timestampValue": "A String", # Contains value if the data is of timestamp type.

568

"boolValue": True or False, # Contains value if the data is of a boolean type.

569

"javaClassValue": "A String", # Contains value if the data is of java class type.

570

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

571

},

572

],

573

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

579

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

580

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

581

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

582

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

583

# language namespace (i.e. python module) which defines the display data.

584

# This allows a dax monitoring system to specially handle the data

585

# and perform custom rendering.

586

"floatValue": 3.14, # Contains value if the data is of float type.

587

"key": "A String", # The key identifying the display data.

588

# This is intended to be used as a label for the display data

589

# when viewed in a dax monitoring system.

590

"shortStrValue": "A String", # A possible additional shorter value to display.

591

# For example a java_class_name_value of com.mypackage.MyDoFn

592

# will be stored with MyDoFn as the short_str_value and

593

# com.mypackage.MyDoFn as the java_class_name value.

594

# short_str_value can be displayed and java_class_name_value

595

# will be displayed as a tooltip.

596

"url": "A String", # An optional full URL.

597

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

598

"timestampValue": "A String", # Contains value if the data is of timestamp type.

599

"boolValue": True or False, # Contains value if the data is of a boolean type.

600

"javaClassValue": "A String", # Contains value if the data is of java class type.

601

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

606

# of the job it replaced.

607

#

608

# When sending a `CreateJobRequest`, you can update a job by specifying it

609

# here. The job named here is stopped, and its intermediate state is

610

# transferred to this job.

611

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

612

# for temporary storage. These temporary files will be

613

# removed on job completion.

614

# No duplicates are allowed.

615

# No file patterns are supported.

616

#

617

# The supported files are:

618

#

619

# Google Cloud Storage:

620

#

621

# storage.googleapis.com/{bucket}/{object}

622

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

623

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

624

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

625

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

626

#

627

# Only one Job with a given name may exist in a project at any

628

# given time. If a caller attempts to create a Job with the same

629

# name as an already-existing Job, the attempt returns the

630

# existing Job.

631

#

632

# The name must match the regular expression

633

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

634

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

635

#

636

# The top-level steps that constitute the entire job.

637

{ # Defines a particular step within a Cloud Dataflow job.

638

#

639

# A job consists of multiple steps, each of which performs some

640

# specific operation as part of the overall job. Data is typically

641

# passed from one step to another as part of the job.

642

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

643

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

644

# Map-Reduce job:

645

#

646

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

647

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

648

#

649

# * Validate the elements.

650

#

651

# * Apply a user-defined function to map each element to some value

652

# and extract an element-specific key value.

653

#

654

# * Group elements with the same key into a single element with

655

# that key, transforming a multiply-keyed collection into a

656

# uniquely-keyed collection.

657

#

658

# * Write the elements out to some data sink.

659

#

660

# Note that the Cloud Dataflow service may be used to run many different

661

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

662

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

663

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

664

"kind": "A String", # The kind of step in the Cloud Dataflow job.

665

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

666

# predefined step has its own required set of properties.

667

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

668

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

669

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

670

},

671

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

672

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

673

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

674

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

675

# isn't contained in the submitted job.

676

"stages": { # A mapping from each stage to the information about that stage.

677

"a_key": { # Contains information about how a particular

678

# google.dataflow.v1beta3.Step will be executed.

679

"stepName": [ # The steps associated with the execution stage.

680

# Note that stages may have several steps, and that a given step

681

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

688

#

689

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

690

# specified.

691

#

692

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

693

# terminal state. After a job has reached a terminal state, no

694

# further state updates may be made.

695

#

696

# This field may be mutated by the Cloud Dataflow service;

697

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

698

"location": "A String", # The [regional endpoint]

699

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

700

# contains this job.

701

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

702

# Flexible resource scheduling jobs are started with some delay after job

703

# creation, so start_time is unset before start and is updated when the

704

# job is started by the Cloud Dataflow service. For other jobs, start_time

705

# always equals to create_time and is immutable and set by the Cloud Dataflow

706

# service.

707

"stepsLocation": "A String", # The GCS location where the steps are stored.

708

"labels": { # User-defined labels for this job.

709

#

710

# The labels map can contain no more than 64 entries. Entries of the labels

711

# map are UTF8 strings that comply with the following restrictions:

712

#

713

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

714

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

715

# * Both keys and values are additionally constrained to be <= 128 bytes in

716

# size.

717

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

718

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

719

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

720

# Cloud Dataflow service.

721

"requestedState": "A String", # The job's requested state.

722

#

723

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

724

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

725

# also be used to directly set a job's requested state to

726

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

727

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

728

},

729

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

730

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

731

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

732

# failed to respond.

733

{ # Indicates which [regional endpoint]

734

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

735

# to respond to a request for data.

736

"name": "A String", # The name of the [regional endpoint]

737

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

# failed to respond.

},

],

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

<code class="details" id="aggregated_next">aggregated_next(previous_request, previous_response)</code>

747

<pre>Retrieves the next page of results.

748

749

Args:

750

previous_request: The request for the previous page. (required)

751

previous_response: The response from the request for the previous page. (required)

752

753

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

754

A request object that you can call 'execute()' on to request the next

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

755

page. Returns None if there are no more items in the collection.

</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

760

<code class="details" id="create">create(projectId, body=None, location=None, replaceJobId=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

761

<pre>Creates a Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

762

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

763

To create a job, we recommend using `projects.locations.jobs.create` with a

764

[regional endpoint]

765

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

766

`projects.jobs.create` is not recommended, as your job will always start

767

in `us-central1`.

768

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

769

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

770

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

771

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

772

The object takes the form of:

773

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

774

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

775

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

776

# If this field is set, the service will ensure its uniqueness.

777

# The request to create a job will fail if the service has knowledge of a

778

# previously submitted job with the same client's ID and job name.

779

# The caller may use this field to ensure idempotence of job

780

# creation across retried attempts to create a job.

781

# By default, the field is empty and, in that case, the service ignores it.

782

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

783

#

784

# This field is set by the Cloud Dataflow service when the Job is

785

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

786

"currentStateTime": "A String", # The timestamp associated with the current state.

787

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

788

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

789

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

790

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

791

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

792

"internalExperiments": { # Experimental settings.

793

"a_key": "", # Properties of the object. Contains field @type with type URL.

794

},

795

"workerRegion": "A String", # The Compute Engine region

796

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

797

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

798

# with worker_zone. If neither worker_region nor worker_zone is specified,

799

# default to the control plane's region.

800

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

801

# at rest, AKA a Customer Managed Encryption Key (CMEK).

802

#

803

# Format:

804

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

805

"userAgent": { # A description of the process that generated the request.

806

"a_key": "", # Properties of the object.

807

},

808

"workerZone": "A String", # The Compute Engine zone

809

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

810

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

811

# with worker_region. If neither worker_region nor worker_zone is specified,

812

# a zone in the control plane's region is chosen based on available capacity.

813

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

814

# unspecified, the service will attempt to choose a reasonable

815

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

816

# e.g. "compute.googleapis.com".

817

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

818

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

819

# this resource prefix, where {JOBNAME} is the value of the

820

# job_name field. The resulting bucket and object prefix is used

821

# as the prefix of the resources used to store temporary data

822

# needed during the job execution. NOTE: This will override the

823

# value in taskrunner_settings.

824

# The supported resource type is:

825

#

826

# Google Cloud Storage:

827

#

828

# storage.googleapis.com/{bucket}/{object}

829

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

830

"experiments": [ # The list of experiments to enable.

831

"A String",

832

],

833

"version": { # A structure describing which components and their versions of the service

834

# are required in order to run the job.

835

"a_key": "", # Properties of the object.

836

},

837

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

838

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

839

# options are passed through the service and are used to recreate the

840

# SDK pipeline options on the worker in a language agnostic and platform

841

# independent way.

842

"a_key": "", # Properties of the object.

843

},

844

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

845

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

846

# specified in order for the job to have workers.

847

{ # Describes one particular pool of Cloud Dataflow workers to be

848

# instantiated by the Cloud Dataflow service in order to perform the

849

# computations required by a job. Note that a workflow job may use

850

# multiple pools, in order to match the various computational

851

# requirements of the various stages of the job.

852

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

853

# service will choose a number of threads (according to the number of cores

854

# on the selected machine type for batch, or 1 by convention for streaming).

855

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

856

# execute the job. If zero or unspecified, the service will

857

# attempt to choose a reasonable default.

858

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

859

# will attempt to choose a reasonable default.

860

"diskSourceImage": "A String", # Fully qualified source image for disks.

861

"packages": [ # Packages to be installed on workers.

862

{ # The packages that must be installed in order for a worker to run the

863

# steps of the Cloud Dataflow job that will be assigned to its worker

864

# pool.

865

#

866

# This is the mechanism by which the Cloud Dataflow SDK causes code to

867

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

868

# might use this to install jars containing the user's code and all of the

869

# various dependencies (libraries, data files, etc.) required in order

870

# for that code to run.

871

"name": "A String", # The name of the package.

872

"location": "A String", # The resource to read the package from. The supported resource type is:

873

#

874

# Google Cloud Storage:

875

#

876

# storage.googleapis.com/{bucket}

877

# bucket.storage.googleapis.com/

878

},

879

],

880

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

881

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

882

# `TEARDOWN_NEVER`.

883

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

884

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

885

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

886

# down.

887

#

888

# If the workers are not torn down by the service, they will

889

# continue to run and use Google Compute Engine VM resources in the

890

# user's project until they are explicitly terminated by the user.

891

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

892

# policy except for small, manually supervised test jobs.

893

#

894

# If unknown or unspecified, the service will attempt to choose a reasonable

895

# default.

896

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

897

# Compute Engine API.

898

"poolArgs": { # Extra arguments for this worker pool.

899

"a_key": "", # Properties of the object. Contains field @type with type URL.

900

},

901

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

902

# attempt to choose a reasonable default.

903

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

904

# harness, residing in Google Container Registry.

905

#

906

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

907

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

908

# attempt to choose a reasonable default.

909

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

910

# service will attempt to choose a reasonable default.

911

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

912

# are supported.

913

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

914

# only be set in the Fn API path. For non-cross-language pipelines this

915

# should have only one entry. Cross-language pipelines will have two or more

916

# entries.

917

{ # Defines a SDK harness container for executing Dataflow pipelines.

918

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

919

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

920

# container instance with this image. If false (or unset) recommends using

921

# more than one core per SDK container instance with this image for

922

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

927

{ # Describes the data disk used by a workflow job.

928

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

929

# must be a disk type appropriate to the project and zone in which

930

# the workers will run. If unknown or unspecified, the service

931

# will attempt to choose a reasonable default.

932

#

933

# For example, the standard persistent disk type is a resource name

934

# typically ending in "pd-standard". If SSD persistent disks are

935

# available, the resource name typically ends with "pd-ssd". The

936

# actual valid values are defined the Google Compute Engine API,

937

# not by the Cloud Dataflow API; consult the Google Compute Engine

938

# documentation for more information about determining the set of

939

# available disk types for a particular project and zone.

940

#

941

# Google Compute Engine Disk types are local to a particular

942

# project in a particular zone, and so the resource name will

943

# typically look something like this:

944

#

945

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

946

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

947

# attempt to choose a reasonable default.

948

"mountPoint": "A String", # Directory in a VM where disk is mounted.

949

},

950

],

951

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

952

# the form "regions/REGION/subnetworks/SUBNETWORK".

953

"ipConfiguration": "A String", # Configuration for VM IPs.

954

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

955

# using the standard Dataflow task runner. Users should ignore

956

# this field.

957

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

958

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

959

# taskrunner; e.g. "wheel".

960

"harnessCommand": "A String", # The command to launch the worker harness.

961

"logDir": "A String", # The directory on the VM to store logs.

962

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

963

# access the Cloud Dataflow API.

964

"A String",

965

],

966

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

967

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

968

# will not be uploaded.

969

#

970

# The supported resource type is:

971

#

972

# Google Cloud Storage:

973

# storage.googleapis.com/{bucket}/{object}

974

# bucket.storage.googleapis.com/{object}

975

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

976

"workflowFileName": "A String", # The file to store the workflow in.

977

"languageHint": "A String", # The suggested backend language.

978

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

979

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

980

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

981

# temporary storage.

982

#

983

# The supported resource type is:

984

#

985

# Google Cloud Storage:

986

# storage.googleapis.com/{bucket}/{object}

987

# bucket.storage.googleapis.com/{object}

988

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

989

#

990

# When workers access Google Cloud APIs, they logically do so via

991

# relative URLs. If this field is specified, it supplies the base

992

# URL to use for resolving these relative URLs. The normative

993

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

994

# Locators".

995

#

996

# If not specified, the default value is "http://www.googleapis.com/"

997

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

998

# console.

999

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1000

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1001

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1002

# storage.

1003

#

1004

# The supported resource type is:

1005

#

1006

# Google Cloud Storage:

1007

#

1008

# storage.googleapis.com/{bucket}/{object}

1009

# bucket.storage.googleapis.com/{object}

1010

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1011

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1012

#

1013

# When workers access Google Cloud APIs, they logically do so via

1014

# relative URLs. If this field is specified, it supplies the base

1015

# URL to use for resolving these relative URLs. The normative

1016

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1017

# Locators".

1018

#

1019

# If not specified, the default value is "http://www.googleapis.com/"

1020

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1021

# "dataflow/v1b3/projects".

1022

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1023

# "shuffle/v1beta1".

1024

"workerId": "A String", # The ID of the worker running this pipeline.

1025

},

1026

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1027

# taskrunner; e.g. "root".

1028

"vmId": "A String", # The ID string of the VM.

1029

},

1030

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1031

"algorithm": "A String", # The algorithm to use for autoscaling.

1032

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1033

},

1034

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1035

"a_key": "A String",

1036

},

1037

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1038

# select a default set of packages which are useful to worker

1039

# harnesses written in a particular language.

1040

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1041

# the service will use the network "default".

1042

},

1043

],

1044

"dataset": "A String", # The dataset for the current project where various workflow

1045

# related tables are stored.

1046

#

1047

# The supported resource type is:

1048

#

1049

# Google BigQuery:

1050

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1051

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1052

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1053

# callers cannot mutate it.

1054

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1055

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1056

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1057

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1058

},

1059

],

1060

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1061

# by the metadata values provided here. Populated for ListJobs and all GetJob

1062

# views SUMMARY and higher.

1063

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1064

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1065

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1066

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1067

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1068

},

1069

],

1070

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1071

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1072

"sdkSupportStatus": "A String", # The support status for this SDK version.

1073

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1074

},

1075

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1076

{ # Metadata for a BigQuery connector used by the job.

1077

"table": "A String", # Table accessed in the connection.

1078

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1079

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1080

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1081

},

1082

],

1083

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1084

{ # Metadata for a File connector used by the job.

1085

"filePattern": "A String", # File Pattern used to access files by the connector.

1086

},

1087

],

1088

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1089

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1090

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1091

"subscription": "A String", # Subscription used in the connection.

1092

},

1093

],

1094

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1095

{ # Metadata for a BigTable connector used by the job.

1096

"projectId": "A String", # ProjectId accessed in the connection.

1097

"instanceId": "A String", # InstanceId accessed in the connection.

1098

"tableId": "A String", # TableId accessed in the connection.

1099

},

1100

],

1101

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1102

{ # Metadata for a Spanner connector used by the job.

1103

"instanceId": "A String", # InstanceId accessed in the connection.

1104

"projectId": "A String", # ProjectId accessed in the connection.

1105

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1106

},

1107

],

1108

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1109

"type": "A String", # The type of Cloud Dataflow job.

1110

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1111

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1112

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1113

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1114

# A description of the user pipeline and stages through which it is executed.

1115

# Created by Cloud Dataflow service. Only retrieved with

1116

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1117

# form. This data is provided by the Dataflow service for ease of visualizing

1118

# the pipeline and interpreting Dataflow provided metrics.

1119

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1120

{ # Description of the composing transforms, names/ids, and input/outputs of a

1121

# stage of execution. Some composing transforms and sources may have been

1122

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1123

"outputSource": [ # Output sources for this stage.

1124

{ # Description of an input or output of an execution stage.

1125

"sizeBytes": "A String", # Size of the source, if measurable.

1126

"name": "A String", # Dataflow service generated name for this source.

1127

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1128

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1129

# source is most closely associated.

1130

},

1131

],

1132

"name": "A String", # Dataflow service generated name for this stage.

1133

"inputSource": [ # Input sources for this stage.

1134

{ # Description of an input or output of an execution stage.

1135

"sizeBytes": "A String", # Size of the source, if measurable.

1136

"name": "A String", # Dataflow service generated name for this source.

1137

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1138

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1139

# source is most closely associated.

1140

},

1141

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1142

"id": "A String", # Dataflow service generated id for this stage.

1143

"componentTransform": [ # Transforms that comprise this execution stage.

1144

{ # Description of a transform executed as part of an execution stage.

1145

"originalTransform": "A String", # User name for the original user transform with which this transform is

1146

# most closely associated.

1147

"name": "A String", # Dataflow service generated name for this source.

1148

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1149

},

1150

],

1151

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1152

{ # Description of an interstitial value between transforms in an execution

1153

# stage.

1154

"name": "A String", # Dataflow service generated name for this source.

1155

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1156

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1157

# source is most closely associated.

1158

},

1159

],

1160

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1161

},

1162

],

1163

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1164

{ # Description of the type, names/ids, and input/outputs for a transform.

1165

"kind": "A String", # Type of transform.

1166

"inputCollectionName": [ # User names for all collection inputs to this transform.

1167

"A String",

1168

],

1169

"name": "A String", # User provided name for this transform instance.

1170

"id": "A String", # SDK generated id of this transform instance.

1171

"displayData": [ # Transform-specific display data.

1172

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1173

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1174

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1175

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1176

# language namespace (i.e. python module) which defines the display data.

1177

# This allows a dax monitoring system to specially handle the data

1178

# and perform custom rendering.

1179

"floatValue": 3.14, # Contains value if the data is of float type.

1180

"key": "A String", # The key identifying the display data.

1181

# This is intended to be used as a label for the display data

1182

# when viewed in a dax monitoring system.

1183

"shortStrValue": "A String", # A possible additional shorter value to display.

1184

# For example a java_class_name_value of com.mypackage.MyDoFn

1185

# will be stored with MyDoFn as the short_str_value and

1186

# com.mypackage.MyDoFn as the java_class_name value.

1187

# short_str_value can be displayed and java_class_name_value

1188

# will be displayed as a tooltip.

1189

"url": "A String", # An optional full URL.

1190

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1191

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1192

"boolValue": True or False, # Contains value if the data is of a boolean type.

1193

"javaClassValue": "A String", # Contains value if the data is of java class type.

1194

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1195

},

1196

],

1197

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1203

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1204

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1205

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1206

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1207

# language namespace (i.e. python module) which defines the display data.

1208

# This allows a dax monitoring system to specially handle the data

1209

# and perform custom rendering.

1210

"floatValue": 3.14, # Contains value if the data is of float type.

1211

"key": "A String", # The key identifying the display data.

1212

# This is intended to be used as a label for the display data

1213

# when viewed in a dax monitoring system.

1214

"shortStrValue": "A String", # A possible additional shorter value to display.

1215

# For example a java_class_name_value of com.mypackage.MyDoFn

1216

# will be stored with MyDoFn as the short_str_value and

1217

# com.mypackage.MyDoFn as the java_class_name value.

1218

# short_str_value can be displayed and java_class_name_value

1219

# will be displayed as a tooltip.

1220

"url": "A String", # An optional full URL.

1221

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1222

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1223

"boolValue": True or False, # Contains value if the data is of a boolean type.

1224

"javaClassValue": "A String", # Contains value if the data is of java class type.

1225

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1230

# of the job it replaced.

1231

#

1232

# When sending a `CreateJobRequest`, you can update a job by specifying it

1233

# here. The job named here is stopped, and its intermediate state is

1234

# transferred to this job.

1235

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1236

# for temporary storage. These temporary files will be

1237

# removed on job completion.

1238

# No duplicates are allowed.

1239

# No file patterns are supported.

1240

#

1241

# The supported files are:

1242

#

1243

# Google Cloud Storage:

1244

#

1245

# storage.googleapis.com/{bucket}/{object}

1246

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1247

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1248

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1249

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1250

#

1251

# Only one Job with a given name may exist in a project at any

1252

# given time. If a caller attempts to create a Job with the same

1253

# name as an already-existing Job, the attempt returns the

1254

# existing Job.

1255

#

1256

# The name must match the regular expression

1257

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1258

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1259

#

1260

# The top-level steps that constitute the entire job.

1261

{ # Defines a particular step within a Cloud Dataflow job.

1262

#

1263

# A job consists of multiple steps, each of which performs some

1264

# specific operation as part of the overall job. Data is typically

1265

# passed from one step to another as part of the job.

1266

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1267

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1268

# Map-Reduce job:

1269

#

1270

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1271

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1272

#

1273

# * Validate the elements.

1274

#

1275

# * Apply a user-defined function to map each element to some value

1276

# and extract an element-specific key value.

1277

#

1278

# * Group elements with the same key into a single element with

1279

# that key, transforming a multiply-keyed collection into a

1280

# uniquely-keyed collection.

1281

#

1282

# * Write the elements out to some data sink.

1283

#

1284

# Note that the Cloud Dataflow service may be used to run many different

1285

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1286

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1287

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1288

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1289

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1290

# predefined step has its own required set of properties.

1291

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1292

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1293

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1294

},

1295

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1296

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1297

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1298

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1299

# isn't contained in the submitted job.

1300

"stages": { # A mapping from each stage to the information about that stage.

1301

"a_key": { # Contains information about how a particular

1302

# google.dataflow.v1beta3.Step will be executed.

1303

"stepName": [ # The steps associated with the execution stage.

1304

# Note that stages may have several steps, and that a given step

1305

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1312

#

1313

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1314

# specified.

1315

#

1316

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1317

# terminal state. After a job has reached a terminal state, no

1318

# further state updates may be made.

1319

#

1320

# This field may be mutated by the Cloud Dataflow service;

1321

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1322

"location": "A String", # The [regional endpoint]

1323

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1324

# contains this job.

1325

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1326

# Flexible resource scheduling jobs are started with some delay after job

1327

# creation, so start_time is unset before start and is updated when the

1328

# job is started by the Cloud Dataflow service. For other jobs, start_time

1329

# always equals to create_time and is immutable and set by the Cloud Dataflow

1330

# service.

1331

"stepsLocation": "A String", # The GCS location where the steps are stored.

1332

"labels": { # User-defined labels for this job.

1333

#

1334

# The labels map can contain no more than 64 entries. Entries of the labels

1335

# map are UTF8 strings that comply with the following restrictions:

1336

#

1337

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1338

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1339

# * Both keys and values are additionally constrained to be <= 128 bytes in

1340

# size.

1341

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1342

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1343

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1344

# Cloud Dataflow service.

1345

"requestedState": "A String", # The job's requested state.

1346

#

1347

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1348

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1349

# also be used to directly set a job's requested state to

1350

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1351

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1352

}

1353

1354

location: string, The [regional endpoint]

1355

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1356

contains this job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1357

replaceJobId: string, Deprecated. This field is now in the Job message.

1358

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1359

x__xgafv: string, V1 error format.

1360

Allowed values

1361

1 - v1 error format

1362

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1363

1364

Returns:

1365

An object of the form:

1366

1367

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1368

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1369

# If this field is set, the service will ensure its uniqueness.

1370

# The request to create a job will fail if the service has knowledge of a

1371

# previously submitted job with the same client's ID and job name.

1372

# The caller may use this field to ensure idempotence of job

1373

# creation across retried attempts to create a job.

1374

# By default, the field is empty and, in that case, the service ignores it.

1375

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1376

#

1377

# This field is set by the Cloud Dataflow service when the Job is

1378

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1379

"currentStateTime": "A String", # The timestamp associated with the current state.

1380

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1381

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1382

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

1383

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1384

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1385

"internalExperiments": { # Experimental settings.

1386

"a_key": "", # Properties of the object. Contains field @type with type URL.

1387

},

1388

"workerRegion": "A String", # The Compute Engine region

1389

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1390

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1391

# with worker_zone. If neither worker_region nor worker_zone is specified,

1392

# default to the control plane's region.

1393

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1394

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1395

#

1396

# Format:

1397

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1398

"userAgent": { # A description of the process that generated the request.

1399

"a_key": "", # Properties of the object.

1400

},

1401

"workerZone": "A String", # The Compute Engine zone

1402

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1403

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1404

# with worker_region. If neither worker_region nor worker_zone is specified,

1405

# a zone in the control plane's region is chosen based on available capacity.

1406

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1407

# unspecified, the service will attempt to choose a reasonable

1408

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1409

# e.g. "compute.googleapis.com".

1410

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1411

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1412

# this resource prefix, where {JOBNAME} is the value of the

1413

# job_name field. The resulting bucket and object prefix is used

1414

# as the prefix of the resources used to store temporary data

1415

# needed during the job execution. NOTE: This will override the

1416

# value in taskrunner_settings.

1417

# The supported resource type is:

1418

#

1419

# Google Cloud Storage:

1420

#

1421

# storage.googleapis.com/{bucket}/{object}

1422

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1423

"experiments": [ # The list of experiments to enable.

1424

"A String",

1425

],

1426

"version": { # A structure describing which components and their versions of the service

1427

# are required in order to run the job.

1428

"a_key": "", # Properties of the object.

1429

},

1430

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1431

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1432

# options are passed through the service and are used to recreate the

1433

# SDK pipeline options on the worker in a language agnostic and platform

1434

# independent way.

1435

"a_key": "", # Properties of the object.

1436

},

1437

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1438

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1439

# specified in order for the job to have workers.

1440

{ # Describes one particular pool of Cloud Dataflow workers to be

1441

# instantiated by the Cloud Dataflow service in order to perform the

1442

# computations required by a job. Note that a workflow job may use

1443

# multiple pools, in order to match the various computational

1444

# requirements of the various stages of the job.

1445

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1446

# service will choose a number of threads (according to the number of cores

1447

# on the selected machine type for batch, or 1 by convention for streaming).

1448

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1449

# execute the job. If zero or unspecified, the service will

1450

# attempt to choose a reasonable default.

1451

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1452

# will attempt to choose a reasonable default.

1453

"diskSourceImage": "A String", # Fully qualified source image for disks.

1454

"packages": [ # Packages to be installed on workers.

1455

{ # The packages that must be installed in order for a worker to run the

1456

# steps of the Cloud Dataflow job that will be assigned to its worker

1457

# pool.

1458

#

1459

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1460

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1461

# might use this to install jars containing the user's code and all of the

1462

# various dependencies (libraries, data files, etc.) required in order

1463

# for that code to run.

1464

"name": "A String", # The name of the package.

1465

"location": "A String", # The resource to read the package from. The supported resource type is:

1466

#

1467

# Google Cloud Storage:

1468

#

1469

# storage.googleapis.com/{bucket}

1470

# bucket.storage.googleapis.com/

1471

},

1472

],

1473

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1474

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1475

# `TEARDOWN_NEVER`.

1476

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1477

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1478

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1479

# down.

1480

#

1481

# If the workers are not torn down by the service, they will

1482

# continue to run and use Google Compute Engine VM resources in the

1483

# user's project until they are explicitly terminated by the user.

1484

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1485

# policy except for small, manually supervised test jobs.

1486

#

1487

# If unknown or unspecified, the service will attempt to choose a reasonable

1488

# default.

1489

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1490

# Compute Engine API.

1491

"poolArgs": { # Extra arguments for this worker pool.

1492

"a_key": "", # Properties of the object. Contains field @type with type URL.

1493

},

1494

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1495

# attempt to choose a reasonable default.

1496

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1497

# harness, residing in Google Container Registry.

1498

#

1499

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1500

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1501

# attempt to choose a reasonable default.

1502

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1503

# service will attempt to choose a reasonable default.

1504

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1505

# are supported.

1506

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1507

# only be set in the Fn API path. For non-cross-language pipelines this

1508

# should have only one entry. Cross-language pipelines will have two or more

1509

# entries.

1510

{ # Defines a SDK harness container for executing Dataflow pipelines.

1511

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1512

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1513

# container instance with this image. If false (or unset) recommends using

1514

# more than one core per SDK container instance with this image for

1515

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1520

{ # Describes the data disk used by a workflow job.

1521

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1522

# must be a disk type appropriate to the project and zone in which

1523

# the workers will run. If unknown or unspecified, the service

1524

# will attempt to choose a reasonable default.

1525

#

1526

# For example, the standard persistent disk type is a resource name

1527

# typically ending in "pd-standard". If SSD persistent disks are

1528

# available, the resource name typically ends with "pd-ssd". The

1529

# actual valid values are defined the Google Compute Engine API,

1530

# not by the Cloud Dataflow API; consult the Google Compute Engine

1531

# documentation for more information about determining the set of

1532

# available disk types for a particular project and zone.

1533

#

1534

# Google Compute Engine Disk types are local to a particular

1535

# project in a particular zone, and so the resource name will

1536

# typically look something like this:

1537

#

1538

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1539

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1540

# attempt to choose a reasonable default.

1541

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1542

},

1543

],

1544

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1545

# the form "regions/REGION/subnetworks/SUBNETWORK".

1546

"ipConfiguration": "A String", # Configuration for VM IPs.

1547

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1548

# using the standard Dataflow task runner. Users should ignore

1549

# this field.

1550

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1551

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1552

# taskrunner; e.g. "wheel".

1553

"harnessCommand": "A String", # The command to launch the worker harness.

1554

"logDir": "A String", # The directory on the VM to store logs.

1555

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1556

# access the Cloud Dataflow API.

1557

"A String",

1558

],

1559

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1560

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1561

# will not be uploaded.

1562

#

1563

# The supported resource type is:

1564

#

1565

# Google Cloud Storage:

1566

# storage.googleapis.com/{bucket}/{object}

1567

# bucket.storage.googleapis.com/{object}

1568

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1569

"workflowFileName": "A String", # The file to store the workflow in.

1570

"languageHint": "A String", # The suggested backend language.

1571

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1572

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1573

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1574

# temporary storage.

1575

#

1576

# The supported resource type is:

1577

#

1578

# Google Cloud Storage:

1579

# storage.googleapis.com/{bucket}/{object}

1580

# bucket.storage.googleapis.com/{object}

1581

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1582

#

1583

# When workers access Google Cloud APIs, they logically do so via

1584

# relative URLs. If this field is specified, it supplies the base

1585

# URL to use for resolving these relative URLs. The normative

1586

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1587

# Locators".

1588

#

1589

# If not specified, the default value is "http://www.googleapis.com/"

1590

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1591

# console.

1592

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1593

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1594

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1595

# storage.

1596

#

1597

# The supported resource type is:

1598

#

1599

# Google Cloud Storage:

1600

#

1601

# storage.googleapis.com/{bucket}/{object}

1602

# bucket.storage.googleapis.com/{object}

1603

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1604

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1605

#

1606

# When workers access Google Cloud APIs, they logically do so via

1607

# relative URLs. If this field is specified, it supplies the base

1608

# URL to use for resolving these relative URLs. The normative

1609

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1610

# Locators".

1611

#

1612

# If not specified, the default value is "http://www.googleapis.com/"

1613

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1614

# "dataflow/v1b3/projects".

1615

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1616

# "shuffle/v1beta1".

1617

"workerId": "A String", # The ID of the worker running this pipeline.

1618

},

1619

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1620

# taskrunner; e.g. "root".

1621

"vmId": "A String", # The ID string of the VM.

1622

},

1623

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1624

"algorithm": "A String", # The algorithm to use for autoscaling.

1625

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1626

},

1627

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1628

"a_key": "A String",

1629

},

1630

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1631

# select a default set of packages which are useful to worker

1632

# harnesses written in a particular language.

1633

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1634

# the service will use the network "default".

1635

},

1636

],

1637

"dataset": "A String", # The dataset for the current project where various workflow

1638

# related tables are stored.

1639

#

1640

# The supported resource type is:

1641

#

1642

# Google BigQuery:

1643

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1644

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1645

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1646

# callers cannot mutate it.

1647

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1648

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1649

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1650

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1651

},

1652

],

1653

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1654

# by the metadata values provided here. Populated for ListJobs and all GetJob

1655

# views SUMMARY and higher.

1656

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1657

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1658

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1659

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1660

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1661

},

1662

],

1663

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1664

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1665

"sdkSupportStatus": "A String", # The support status for this SDK version.

1666

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1667

},

1668

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1669

{ # Metadata for a BigQuery connector used by the job.

1670

"table": "A String", # Table accessed in the connection.

1671

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1672

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1673

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1674

},

1675

],

1676

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1677

{ # Metadata for a File connector used by the job.

1678

"filePattern": "A String", # File Pattern used to access files by the connector.

1679

},

1680

],

1681

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1682

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1683

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1684

"subscription": "A String", # Subscription used in the connection.

1685

},

1686

],

1687

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1688

{ # Metadata for a BigTable connector used by the job.

1689

"projectId": "A String", # ProjectId accessed in the connection.

1690

"instanceId": "A String", # InstanceId accessed in the connection.

1691

"tableId": "A String", # TableId accessed in the connection.

1692

},

1693

],

1694

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1695

{ # Metadata for a Spanner connector used by the job.

1696

"instanceId": "A String", # InstanceId accessed in the connection.

1697

"projectId": "A String", # ProjectId accessed in the connection.

1698

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1699

},

1700

],

1701

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1702

"type": "A String", # The type of Cloud Dataflow job.

1703

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1704

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1705

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1706

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1707

# A description of the user pipeline and stages through which it is executed.

1708

# Created by Cloud Dataflow service. Only retrieved with

1709

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1710

# form. This data is provided by the Dataflow service for ease of visualizing

1711

# the pipeline and interpreting Dataflow provided metrics.

1712

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1713

{ # Description of the composing transforms, names/ids, and input/outputs of a

1714

# stage of execution. Some composing transforms and sources may have been

1715

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1716

"outputSource": [ # Output sources for this stage.

1717

{ # Description of an input or output of an execution stage.

1718

"sizeBytes": "A String", # Size of the source, if measurable.

1719

"name": "A String", # Dataflow service generated name for this source.

1720

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1721

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1722

# source is most closely associated.

1723

},

1724

],

1725

"name": "A String", # Dataflow service generated name for this stage.

1726

"inputSource": [ # Input sources for this stage.

1727

{ # Description of an input or output of an execution stage.

1728

"sizeBytes": "A String", # Size of the source, if measurable.

1729

"name": "A String", # Dataflow service generated name for this source.

1730

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1731

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1732

# source is most closely associated.

1733

},

1734

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1735

"id": "A String", # Dataflow service generated id for this stage.

1736

"componentTransform": [ # Transforms that comprise this execution stage.

1737

{ # Description of a transform executed as part of an execution stage.

1738

"originalTransform": "A String", # User name for the original user transform with which this transform is

1739

# most closely associated.

1740

"name": "A String", # Dataflow service generated name for this source.

1741

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1742

},

1743

],

1744

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1745

{ # Description of an interstitial value between transforms in an execution

1746

# stage.

1747

"name": "A String", # Dataflow service generated name for this source.

1748

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1749

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1750

# source is most closely associated.

1751

},

1752

],

1753

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1754

},

1755

],

1756

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1757

{ # Description of the type, names/ids, and input/outputs for a transform.

1758

"kind": "A String", # Type of transform.

1759

"inputCollectionName": [ # User names for all collection inputs to this transform.

1760

"A String",

1761

],

1762

"name": "A String", # User provided name for this transform instance.

1763

"id": "A String", # SDK generated id of this transform instance.

1764

"displayData": [ # Transform-specific display data.

1765

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1766

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1767

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1768

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1769

# language namespace (i.e. python module) which defines the display data.

1770

# This allows a dax monitoring system to specially handle the data

1771

# and perform custom rendering.

1772

"floatValue": 3.14, # Contains value if the data is of float type.

1773

"key": "A String", # The key identifying the display data.

1774

# This is intended to be used as a label for the display data

1775

# when viewed in a dax monitoring system.

1776

"shortStrValue": "A String", # A possible additional shorter value to display.

1777

# For example a java_class_name_value of com.mypackage.MyDoFn

1778

# will be stored with MyDoFn as the short_str_value and

1779

# com.mypackage.MyDoFn as the java_class_name value.

1780

# short_str_value can be displayed and java_class_name_value

1781

# will be displayed as a tooltip.

1782

"url": "A String", # An optional full URL.

1783

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1784

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1785

"boolValue": True or False, # Contains value if the data is of a boolean type.

1786

"javaClassValue": "A String", # Contains value if the data is of java class type.

1787

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1788

},

1789

],

1790

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1796

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1797

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1798

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1799

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1800

# language namespace (i.e. python module) which defines the display data.

1801

# This allows a dax monitoring system to specially handle the data

1802

# and perform custom rendering.

1803

"floatValue": 3.14, # Contains value if the data is of float type.

1804

"key": "A String", # The key identifying the display data.

1805

# This is intended to be used as a label for the display data

1806

# when viewed in a dax monitoring system.

1807

"shortStrValue": "A String", # A possible additional shorter value to display.

1808

# For example a java_class_name_value of com.mypackage.MyDoFn

1809

# will be stored with MyDoFn as the short_str_value and

1810

# com.mypackage.MyDoFn as the java_class_name value.

1811

# short_str_value can be displayed and java_class_name_value

1812

# will be displayed as a tooltip.

1813

"url": "A String", # An optional full URL.

1814

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1815

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1816

"boolValue": True or False, # Contains value if the data is of a boolean type.

1817

"javaClassValue": "A String", # Contains value if the data is of java class type.

1818

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1823

# of the job it replaced.

1824

#

1825

# When sending a `CreateJobRequest`, you can update a job by specifying it

1826

# here. The job named here is stopped, and its intermediate state is

1827

# transferred to this job.

1828

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1829

# for temporary storage. These temporary files will be

1830

# removed on job completion.

1831

# No duplicates are allowed.

1832

# No file patterns are supported.

1833

#

1834

# The supported files are:

1835

#

1836

# Google Cloud Storage:

1837

#

1838

# storage.googleapis.com/{bucket}/{object}

1839

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1840

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1841

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1842

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1843

#

1844

# Only one Job with a given name may exist in a project at any

1845

# given time. If a caller attempts to create a Job with the same

1846

# name as an already-existing Job, the attempt returns the

1847

# existing Job.

1848

#

1849

# The name must match the regular expression

1850

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1851

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1852

#

1853

# The top-level steps that constitute the entire job.

1854

{ # Defines a particular step within a Cloud Dataflow job.

1855

#

1856

# A job consists of multiple steps, each of which performs some

1857

# specific operation as part of the overall job. Data is typically

1858

# passed from one step to another as part of the job.

1859

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1860

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1861

# Map-Reduce job:

1862

#

1863

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1864

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1865

#

1866

# * Validate the elements.

1867

#

1868

# * Apply a user-defined function to map each element to some value

1869

# and extract an element-specific key value.

1870

#

1871

# * Group elements with the same key into a single element with

1872

# that key, transforming a multiply-keyed collection into a

1873

# uniquely-keyed collection.

1874

#

1875

# * Write the elements out to some data sink.

1876

#

1877

# Note that the Cloud Dataflow service may be used to run many different

1878

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1879

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1880

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1881

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1882

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1883

# predefined step has its own required set of properties.

1884

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1885

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1886

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1887

},

1888

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1889

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1890

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1891

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1892

# isn't contained in the submitted job.

1893

"stages": { # A mapping from each stage to the information about that stage.

1894

"a_key": { # Contains information about how a particular

1895

# google.dataflow.v1beta3.Step will be executed.

1896

"stepName": [ # The steps associated with the execution stage.

1897

# Note that stages may have several steps, and that a given step

1898

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1905

#

1906

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1907

# specified.

1908

#

1909

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1910

# terminal state. After a job has reached a terminal state, no

1911

# further state updates may be made.

1912

#

1913

# This field may be mutated by the Cloud Dataflow service;

1914

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1915

"location": "A String", # The [regional endpoint]

1916

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1917

# contains this job.

1918

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1919

# Flexible resource scheduling jobs are started with some delay after job

1920

# creation, so start_time is unset before start and is updated when the

1921

# job is started by the Cloud Dataflow service. For other jobs, start_time

1922

# always equals to create_time and is immutable and set by the Cloud Dataflow

1923

# service.

1924

"stepsLocation": "A String", # The GCS location where the steps are stored.

1925

"labels": { # User-defined labels for this job.

1926

#

1927

# The labels map can contain no more than 64 entries. Entries of the labels

1928

# map are UTF8 strings that comply with the following restrictions:

1929

#

1930

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1931

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1932

# * Both keys and values are additionally constrained to be <= 128 bytes in

1933

# size.

1934

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1935

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1936

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1937

# Cloud Dataflow service.

1938

"requestedState": "A String", # The job's requested state.

1939

#

1940

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1941

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1942

# also be used to directly set a job's requested state to

1943

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1944

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1949

<code class="details" id="get">get(projectId, jobId, view=None, location=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1950

<pre>Gets the state of the specified Cloud Dataflow job.

1951

1952

To get the state of a job, we recommend using `projects.locations.jobs.get`

1953

with a [regional endpoint]

1954

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1955

`projects.jobs.get` is not recommended, as you can only get the state of

1956

jobs that are running in `us-central1`.

1957

1958

Args:

1959

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1960

jobId: string, The job ID. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1961

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1962

location: string, The [regional endpoint]

1963

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1964

contains this job.

1965

x__xgafv: string, V1 error format.

1966

Allowed values

1967

1 - v1 error format

1968

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1969

1970

Returns:

1971

An object of the form:

1972

1973

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1974

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1975

# If this field is set, the service will ensure its uniqueness.

1976

# The request to create a job will fail if the service has knowledge of a

1977

# previously submitted job with the same client's ID and job name.

1978

# The caller may use this field to ensure idempotence of job

1979

# creation across retried attempts to create a job.

1980

# By default, the field is empty and, in that case, the service ignores it.

1981

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1982

#

1983

# This field is set by the Cloud Dataflow service when the Job is

1984

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1985

"currentStateTime": "A String", # The timestamp associated with the current state.

1986

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1987

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1988

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1989

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1990

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1991

"internalExperiments": { # Experimental settings.

1992

"a_key": "", # Properties of the object. Contains field @type with type URL.

1993

},

1994

"workerRegion": "A String", # The Compute Engine region

1995

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1996

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1997

# with worker_zone. If neither worker_region nor worker_zone is specified,

1998

# default to the control plane's region.

1999

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2000

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2001

#

2002

# Format:

2003

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2004

"userAgent": { # A description of the process that generated the request.

2005

"a_key": "", # Properties of the object.

2006

},

2007

"workerZone": "A String", # The Compute Engine zone

2008

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2009

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2010

# with worker_region. If neither worker_region nor worker_zone is specified,

2011

# a zone in the control plane's region is chosen based on available capacity.

2012

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2013

# unspecified, the service will attempt to choose a reasonable

2014

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2015

# e.g. "compute.googleapis.com".

2016

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2017

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2018

# this resource prefix, where {JOBNAME} is the value of the

2019

# job_name field. The resulting bucket and object prefix is used

2020

# as the prefix of the resources used to store temporary data

2021

# needed during the job execution. NOTE: This will override the

2022

# value in taskrunner_settings.

2023

# The supported resource type is:

2024

#

2025

# Google Cloud Storage:

2026

#

2027

# storage.googleapis.com/{bucket}/{object}

2028

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2029

"experiments": [ # The list of experiments to enable.

2030

"A String",

2031

],

2032

"version": { # A structure describing which components and their versions of the service

2033

# are required in order to run the job.

2034

"a_key": "", # Properties of the object.

2035

},

2036

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2037

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2038

# options are passed through the service and are used to recreate the

2039

# SDK pipeline options on the worker in a language agnostic and platform

2040

# independent way.

2041

"a_key": "", # Properties of the object.

2042

},

2043

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2044

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2045

# specified in order for the job to have workers.

2046

{ # Describes one particular pool of Cloud Dataflow workers to be

2047

# instantiated by the Cloud Dataflow service in order to perform the

2048

# computations required by a job. Note that a workflow job may use

2049

# multiple pools, in order to match the various computational

2050

# requirements of the various stages of the job.

2051

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2052

# service will choose a number of threads (according to the number of cores

2053

# on the selected machine type for batch, or 1 by convention for streaming).

2054

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2055

# execute the job. If zero or unspecified, the service will

2056

# attempt to choose a reasonable default.

2057

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2058

# will attempt to choose a reasonable default.

2059

"diskSourceImage": "A String", # Fully qualified source image for disks.

2060

"packages": [ # Packages to be installed on workers.

2061

{ # The packages that must be installed in order for a worker to run the

2062

# steps of the Cloud Dataflow job that will be assigned to its worker

2063

# pool.

2064

#

2065

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2066

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2067

# might use this to install jars containing the user's code and all of the

2068

# various dependencies (libraries, data files, etc.) required in order

2069

# for that code to run.

2070

"name": "A String", # The name of the package.

2071

"location": "A String", # The resource to read the package from. The supported resource type is:

2072

#

2073

# Google Cloud Storage:

2074

#

2075

# storage.googleapis.com/{bucket}

2076

# bucket.storage.googleapis.com/

2077

},

2078

],

2079

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2080

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2081

# `TEARDOWN_NEVER`.

2082

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2083

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2084

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2085

# down.

2086

#

2087

# If the workers are not torn down by the service, they will

2088

# continue to run and use Google Compute Engine VM resources in the

2089

# user's project until they are explicitly terminated by the user.

2090

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2091

# policy except for small, manually supervised test jobs.

2092

#

2093

# If unknown or unspecified, the service will attempt to choose a reasonable

2094

# default.

2095

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2096

# Compute Engine API.

2097

"poolArgs": { # Extra arguments for this worker pool.

2098

"a_key": "", # Properties of the object. Contains field @type with type URL.

2099

},

2100

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2101

# attempt to choose a reasonable default.

2102

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2103

# harness, residing in Google Container Registry.

2104

#

2105

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2106

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

2107

# attempt to choose a reasonable default.

2108

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2109

# service will attempt to choose a reasonable default.

2110

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2111

# are supported.

2112

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2113

# only be set in the Fn API path. For non-cross-language pipelines this

2114

# should have only one entry. Cross-language pipelines will have two or more

2115

# entries.

2116

{ # Defines a SDK harness container for executing Dataflow pipelines.

2117

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2118

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2119

# container instance with this image. If false (or unset) recommends using

2120

# more than one core per SDK container instance with this image for

2121

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2126

{ # Describes the data disk used by a workflow job.

2127

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2128

# must be a disk type appropriate to the project and zone in which

2129

# the workers will run. If unknown or unspecified, the service

2130

# will attempt to choose a reasonable default.

2131

#

2132

# For example, the standard persistent disk type is a resource name

2133

# typically ending in "pd-standard". If SSD persistent disks are

2134

# available, the resource name typically ends with "pd-ssd". The

2135

# actual valid values are defined the Google Compute Engine API,

2136

# not by the Cloud Dataflow API; consult the Google Compute Engine

2137

# documentation for more information about determining the set of

2138

# available disk types for a particular project and zone.

2139

#

2140

# Google Compute Engine Disk types are local to a particular

2141

# project in a particular zone, and so the resource name will

2142

# typically look something like this:

2143

#

2144

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2145

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2146

# attempt to choose a reasonable default.

2147

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2148

},

2149

],

2150

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2151

# the form "regions/REGION/subnetworks/SUBNETWORK".

2152

"ipConfiguration": "A String", # Configuration for VM IPs.

2153

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2154

# using the standard Dataflow task runner. Users should ignore

2155

# this field.

2156

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2157

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2158

# taskrunner; e.g. "wheel".

2159

"harnessCommand": "A String", # The command to launch the worker harness.

2160

"logDir": "A String", # The directory on the VM to store logs.

2161

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2162

# access the Cloud Dataflow API.

2163

"A String",

2164

],

2165

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2166

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2167

# will not be uploaded.

2168

#

2169

# The supported resource type is:

2170

#

2171

# Google Cloud Storage:

2172

# storage.googleapis.com/{bucket}/{object}

2173

# bucket.storage.googleapis.com/{object}

2174

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2175

"workflowFileName": "A String", # The file to store the workflow in.

2176

"languageHint": "A String", # The suggested backend language.

2177

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2178

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2179

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2180

# temporary storage.

2181

#

2182

# The supported resource type is:

2183

#

2184

# Google Cloud Storage:

2185

# storage.googleapis.com/{bucket}/{object}

2186

# bucket.storage.googleapis.com/{object}

2187

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2188

#

2189

# When workers access Google Cloud APIs, they logically do so via

2190

# relative URLs. If this field is specified, it supplies the base

2191

# URL to use for resolving these relative URLs. The normative

2192

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2193

# Locators".

2194

#

2195

# If not specified, the default value is "http://www.googleapis.com/"

2196

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2197

# console.

2198

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2199

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2200

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2201

# storage.

2202

#

2203

# The supported resource type is:

2204

#

2205

# Google Cloud Storage:

2206

#

2207

# storage.googleapis.com/{bucket}/{object}

2208

# bucket.storage.googleapis.com/{object}

2209

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2210

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2211

#

2212

# When workers access Google Cloud APIs, they logically do so via

2213

# relative URLs. If this field is specified, it supplies the base

2214

# URL to use for resolving these relative URLs. The normative

2215

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2216

# Locators".

2217

#

2218

# If not specified, the default value is "http://www.googleapis.com/"

2219

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2220

# "dataflow/v1b3/projects".

2221

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2222

# "shuffle/v1beta1".

2223

"workerId": "A String", # The ID of the worker running this pipeline.

2224

},

2225

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2226

# taskrunner; e.g. "root".

2227

"vmId": "A String", # The ID string of the VM.

2228

},

2229

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2230

"algorithm": "A String", # The algorithm to use for autoscaling.

2231

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2232

},

2233

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2234

"a_key": "A String",

2235

},

2236

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2237

# select a default set of packages which are useful to worker

2238

# harnesses written in a particular language.

2239

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2240

# the service will use the network "default".

2241

},

2242

],

2243

"dataset": "A String", # The dataset for the current project where various workflow

2244

# related tables are stored.

2245

#

2246

# The supported resource type is:

2247

#

2248

# Google BigQuery:

2249

# bigquery.googleapis.com/{dataset}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2250

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2251

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2252

# callers cannot mutate it.

2253

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2254

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2255

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2256

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2257

},

2258

],

2259

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2260

# by the metadata values provided here. Populated for ListJobs and all GetJob

2261

# views SUMMARY and higher.

2262

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2263

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2264

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2265

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2266

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2267

},

2268

],

2269

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2270

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2271

"sdkSupportStatus": "A String", # The support status for this SDK version.

2272

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2273

},

2274

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2275

{ # Metadata for a BigQuery connector used by the job.

2276

"table": "A String", # Table accessed in the connection.

2277

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2278

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2279

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2280

},

2281

],

2282

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2283

{ # Metadata for a File connector used by the job.

2284

"filePattern": "A String", # File Pattern used to access files by the connector.

2285

},

2286

],

2287

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2288

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2289

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2290

"subscription": "A String", # Subscription used in the connection.

2291

},

2292

],

2293

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2294

{ # Metadata for a BigTable connector used by the job.

2295

"projectId": "A String", # ProjectId accessed in the connection.

2296

"instanceId": "A String", # InstanceId accessed in the connection.

2297

"tableId": "A String", # TableId accessed in the connection.

2298

},

2299

],

2300

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2301

{ # Metadata for a Spanner connector used by the job.

2302

"instanceId": "A String", # InstanceId accessed in the connection.

2303

"projectId": "A String", # ProjectId accessed in the connection.

2304

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2305

},

2306

],

2307

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2308

"type": "A String", # The type of Cloud Dataflow job.

2309

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2310

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2311

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2312

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2313

# A description of the user pipeline and stages through which it is executed.

2314

# Created by Cloud Dataflow service. Only retrieved with

2315

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2316

# form. This data is provided by the Dataflow service for ease of visualizing

2317

# the pipeline and interpreting Dataflow provided metrics.

2318

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2319

{ # Description of the composing transforms, names/ids, and input/outputs of a

2320

# stage of execution. Some composing transforms and sources may have been

2321

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2322

"outputSource": [ # Output sources for this stage.

2323

{ # Description of an input or output of an execution stage.

2324

"sizeBytes": "A String", # Size of the source, if measurable.

2325

"name": "A String", # Dataflow service generated name for this source.

2326

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2327

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2328

# source is most closely associated.

2329

},

2330

],

2331

"name": "A String", # Dataflow service generated name for this stage.

2332

"inputSource": [ # Input sources for this stage.

2333

{ # Description of an input or output of an execution stage.

2334

"sizeBytes": "A String", # Size of the source, if measurable.

2335

"name": "A String", # Dataflow service generated name for this source.

2336

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2337

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2338

# source is most closely associated.

2339

},

2340

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2341

"id": "A String", # Dataflow service generated id for this stage.

2342

"componentTransform": [ # Transforms that comprise this execution stage.

2343

{ # Description of a transform executed as part of an execution stage.

2344

"originalTransform": "A String", # User name for the original user transform with which this transform is

2345

# most closely associated.

2346

"name": "A String", # Dataflow service generated name for this source.

2347

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2348

},

2349

],

2350

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2351

{ # Description of an interstitial value between transforms in an execution

2352

# stage.

2353

"name": "A String", # Dataflow service generated name for this source.

2354

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2355

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2356

# source is most closely associated.

2357

},

2358

],

2359

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2360

},

2361

],

2362

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2363

{ # Description of the type, names/ids, and input/outputs for a transform.

2364

"kind": "A String", # Type of transform.

2365

"inputCollectionName": [ # User names for all collection inputs to this transform.

2366

"A String",

2367

],

2368

"name": "A String", # User provided name for this transform instance.

2369

"id": "A String", # SDK generated id of this transform instance.

2370

"displayData": [ # Transform-specific display data.

2371

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2372

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2373

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2374

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2375

# language namespace (i.e. python module) which defines the display data.

2376

# This allows a dax monitoring system to specially handle the data

2377

# and perform custom rendering.

2378

"floatValue": 3.14, # Contains value if the data is of float type.

2379

"key": "A String", # The key identifying the display data.

2380

# This is intended to be used as a label for the display data

2381

# when viewed in a dax monitoring system.

2382

"shortStrValue": "A String", # A possible additional shorter value to display.

2383

# For example a java_class_name_value of com.mypackage.MyDoFn

2384

# will be stored with MyDoFn as the short_str_value and

2385

# com.mypackage.MyDoFn as the java_class_name value.

2386

# short_str_value can be displayed and java_class_name_value

2387

# will be displayed as a tooltip.

2388

"url": "A String", # An optional full URL.

2389

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2390

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2391

"boolValue": True or False, # Contains value if the data is of a boolean type.

2392

"javaClassValue": "A String", # Contains value if the data is of java class type.

2393

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2394

},

2395

],

2396

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

2402

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2403

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2404

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2405

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2406

# language namespace (i.e. python module) which defines the display data.

2407

# This allows a dax monitoring system to specially handle the data

2408

# and perform custom rendering.

2409

"floatValue": 3.14, # Contains value if the data is of float type.

2410

"key": "A String", # The key identifying the display data.

2411

# This is intended to be used as a label for the display data

2412

# when viewed in a dax monitoring system.

2413

"shortStrValue": "A String", # A possible additional shorter value to display.

2414

# For example a java_class_name_value of com.mypackage.MyDoFn

2415

# will be stored with MyDoFn as the short_str_value and

2416

# com.mypackage.MyDoFn as the java_class_name value.

2417

# short_str_value can be displayed and java_class_name_value

2418

# will be displayed as a tooltip.

2419

"url": "A String", # An optional full URL.

2420

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2421

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2422

"boolValue": True or False, # Contains value if the data is of a boolean type.

2423

"javaClassValue": "A String", # Contains value if the data is of java class type.

2424

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2429

# of the job it replaced.

2430

#

2431

# When sending a `CreateJobRequest`, you can update a job by specifying it

2432

# here. The job named here is stopped, and its intermediate state is

2433

# transferred to this job.

2434

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2435

# for temporary storage. These temporary files will be

2436

# removed on job completion.

2437

# No duplicates are allowed.

2438

# No file patterns are supported.

2439

#

2440

# The supported files are:

2441

#

2442

# Google Cloud Storage:

2443

#

2444

# storage.googleapis.com/{bucket}/{object}

2445

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2446

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2447

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2448

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2449

#

2450

# Only one Job with a given name may exist in a project at any

2451

# given time. If a caller attempts to create a Job with the same

2452

# name as an already-existing Job, the attempt returns the

2453

# existing Job.

2454

#

2455

# The name must match the regular expression

2456

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2457

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2458

#

2459

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2460

{ # Defines a particular step within a Cloud Dataflow job.

2461

#

2462

# A job consists of multiple steps, each of which performs some

2463

# specific operation as part of the overall job. Data is typically

2464

# passed from one step to another as part of the job.

2465

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2466

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2467

# Map-Reduce job:

2468

#

2469

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2470

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2471

#

2472

# * Validate the elements.

2473

#

2474

# * Apply a user-defined function to map each element to some value

2475

# and extract an element-specific key value.

2476

#

2477

# * Group elements with the same key into a single element with

2478

# that key, transforming a multiply-keyed collection into a

2479

# uniquely-keyed collection.

2480

#

2481

# * Write the elements out to some data sink.

2482

#

2483

# Note that the Cloud Dataflow service may be used to run many different

2484

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2485

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2486

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2487

"kind": "A String", # The kind of step in the Cloud Dataflow job.

2488

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2489

# predefined step has its own required set of properties.

2490

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2491

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2492

},

2493

},

2494

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2495

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2496

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2497

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2498

# isn't contained in the submitted job.

2499

"stages": { # A mapping from each stage to the information about that stage.

2500

"a_key": { # Contains information about how a particular

2501

# google.dataflow.v1beta3.Step will be executed.

2502

"stepName": [ # The steps associated with the execution stage.

2503

# Note that stages may have several steps, and that a given step

2504

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2511

#

2512

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2513

# specified.

2514

#

2515

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2516

# terminal state. After a job has reached a terminal state, no

2517

# further state updates may be made.

2518

#

2519

# This field may be mutated by the Cloud Dataflow service;

2520

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2521

"location": "A String", # The [regional endpoint]

2522

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2523

# contains this job.

2524

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2525

# Flexible resource scheduling jobs are started with some delay after job

2526

# creation, so start_time is unset before start and is updated when the

2527

# job is started by the Cloud Dataflow service. For other jobs, start_time

2528

# always equals to create_time and is immutable and set by the Cloud Dataflow

2529

# service.

2530

"stepsLocation": "A String", # The GCS location where the steps are stored.

2531

"labels": { # User-defined labels for this job.

2532

#

2533

# The labels map can contain no more than 64 entries. Entries of the labels

2534

# map are UTF8 strings that comply with the following restrictions:

2535

#

2536

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2537

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2538

# * Both keys and values are additionally constrained to be <= 128 bytes in

2539

# size.

2540

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2541

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2542

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2543

# Cloud Dataflow service.

2544

"requestedState": "A String", # The job's requested state.

2545

#

2546

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2547

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2548

# also be used to directly set a job's requested state to

2549

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2550

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2551

}</pre>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

</div>

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2555

<code class="details" id="getMetrics">getMetrics(projectId, jobId, startTime=None, location=None, x__xgafv=None)</code>

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2556

<pre>Request the job status.

2557

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2558

To request the status of a job, we recommend using

2559

`projects.locations.jobs.getMetrics` with a [regional endpoint]

2560

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2561

`projects.jobs.getMetrics` is not recommended, as you can only request the

2562

status of jobs that are running in `us-central1`.

2563

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2564

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2565

projectId: string, A project id. (required)

2566

jobId: string, The job to get messages for. (required)

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2567

startTime: string, Return only metric data that has changed since this time.

2568

Default is to return all information about all metrics for the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2569

location: string, The [regional endpoint]

2570

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2571

contains the job specified by job_id.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2572

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2573

Allowed values

2574

1 - v1 error format

2575

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2576

2577

Returns:

2578

An object of the form:

2579

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2580

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2581

# of a Dataflow job. Metrics correspond to user-defined and system-defined

2582

# metrics in the job.

2583

#

2584

# This resource captures only the most recent values of each metric;

2585

# time-series data can be queried for them (under the same metric names)

2586

# from Cloud Monitoring.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2587

"metrics": [ # All metrics for this job.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2588

{ # Describes the state of a metric.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2589

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

2590

# possible value type is a list of Values whose type can be Long, Double,

2591

# or String, according to the metric's type. All Values in the list must

2592

# be of the same type.

2593

"gauge": "", # A struct value describing properties of a Gauge.

2594

# Metrics of gauge type show the value of a metric across time, and is

2595

# aggregated based on the newest value.

2596

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

2597

# value accumulated since the worker started working on this WorkItem.

2598

# By default this is false, indicating that this metric is reported

2599

# as a delta that is not associated with any WorkItem.

2600

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

2601

# service.

2602

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

2603

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2604

# The specified aggregation kind is case-insensitive.

2605

#

2606

# If omitted, this is not an aggregated value but instead

2607

# a single metric sample value.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2608

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

2609

# "And", and "Or". The possible value types are Long, Double, and Boolean.

2610

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

2611

# This holds the count of the aggregated values and is used in combination

2612

# with mean_sum above to obtain the actual mean aggregate value.

2613

# The only possible value type is Long.

2614

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

Sai Cheemalapati

4ba8c23

2017-06-06 18:46:08 -0400

[diff] [blame]

2615

# This holds the sum of the aggregated values and is used in combination

2616

# with mean_count below to obtain the actual mean aggregate value.

2617

# The only possible value types are Long and Double.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2618

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2619

# reporting work progress; it will be filled in responses from the

2620

# metrics API.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2621

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

2622

# metric.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2623

"name": "A String", # Worker-defined metric name.

2624

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

2625

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2626

"context": { # Zero or more labeled fields which identify the part of the job this

2627

# metric is associated with, such as the name of a step or collection.

2628

#

2629

# For example, built-in counters associated with steps will have

2630

# context['step'] = <step-name>. Counters associated with PCollections

2631

# in the SDK will have context['pcollection'] = <pcollection-name>.

2632

"a_key": "A String",

2633

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2634

},

2635

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2636

},

2637

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2638

"metricTime": "A String", # Timestamp as of which metric values are current.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2643

<code class="details" id="list">list(projectId, filter=None, location=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2644

<pre>List the jobs of a project.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2645

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2646

To list the jobs of a project in a region, we recommend using

2647

`projects.locations.jobs.get` with a [regional endpoint]

2648

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2649

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2650

`projects.jobs.list` is not recommended, as you can only get the list of

2651

jobs that are running in `us-central1`.

2652

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2653

Args:

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2654

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2655

filter: string, The kind of filter to use.

2656

location: string, The [regional endpoint]

2657

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2658

contains this job.

2659

pageToken: string, Set this to the 'next_page_token' field of a previous response

2660

to request additional results in a long list.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2661

pageSize: integer, If there are many jobs, limit response to at most this many.

2662

The actual number of jobs returned will be the lesser of max_responses

2663

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2664

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

2665

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2666

Allowed values

2667

1 - v1 error format

2668

2 - v2 error format

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

2669

2670

Returns:

2671

An object of the form:

2672

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2673

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2674

# be a partial response, depending on the page size in the ListJobsRequest.

2675

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2676

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2677

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2678

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2679

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2680

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2681

# If this field is set, the service will ensure its uniqueness.

2682

# The request to create a job will fail if the service has knowledge of a

2683

# previously submitted job with the same client's ID and job name.

2684

# The caller may use this field to ensure idempotence of job

2685

# creation across retried attempts to create a job.

2686

# By default, the field is empty and, in that case, the service ignores it.

2687

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2688

#

2689

# This field is set by the Cloud Dataflow service when the Job is

2690

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2691

"currentStateTime": "A String", # The timestamp associated with the current state.

2692

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2693

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2694

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2695

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2696

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2697

"internalExperiments": { # Experimental settings.

2698

"a_key": "", # Properties of the object. Contains field @type with type URL.

2699

},

2700

"workerRegion": "A String", # The Compute Engine region

2701

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2702

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2703

# with worker_zone. If neither worker_region nor worker_zone is specified,

2704

# default to the control plane's region.

2705

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2706

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2707

#

2708

# Format:

2709

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2710

"userAgent": { # A description of the process that generated the request.

2711

"a_key": "", # Properties of the object.

2712

},

2713

"workerZone": "A String", # The Compute Engine zone

2714

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2715

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2716

# with worker_region. If neither worker_region nor worker_zone is specified,

2717

# a zone in the control plane's region is chosen based on available capacity.

2718

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2719

# unspecified, the service will attempt to choose a reasonable

2720

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2721

# e.g. "compute.googleapis.com".

2722

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2723

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2724

# this resource prefix, where {JOBNAME} is the value of the

2725

# job_name field. The resulting bucket and object prefix is used

2726

# as the prefix of the resources used to store temporary data

2727

# needed during the job execution. NOTE: This will override the

2728

# value in taskrunner_settings.

2729

# The supported resource type is:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2730

#

2731

# Google Cloud Storage:

2732

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2733

# storage.googleapis.com/{bucket}/{object}

2734

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2735

"experiments": [ # The list of experiments to enable.

2736

"A String",

2737

],

2738

"version": { # A structure describing which components and their versions of the service

2739

# are required in order to run the job.

2740

"a_key": "", # Properties of the object.

2741

},

2742

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2743

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2744

# options are passed through the service and are used to recreate the

2745

# SDK pipeline options on the worker in a language agnostic and platform

2746

# independent way.

2747

"a_key": "", # Properties of the object.

2748

},

2749

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2750

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2751

# specified in order for the job to have workers.

2752

{ # Describes one particular pool of Cloud Dataflow workers to be

2753

# instantiated by the Cloud Dataflow service in order to perform the

2754

# computations required by a job. Note that a workflow job may use

2755

# multiple pools, in order to match the various computational

2756

# requirements of the various stages of the job.

2757

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2758

# service will choose a number of threads (according to the number of cores

2759

# on the selected machine type for batch, or 1 by convention for streaming).

2760

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2761

# execute the job. If zero or unspecified, the service will

2762

# attempt to choose a reasonable default.

2763

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2764

# will attempt to choose a reasonable default.

2765

"diskSourceImage": "A String", # Fully qualified source image for disks.

2766

"packages": [ # Packages to be installed on workers.

2767

{ # The packages that must be installed in order for a worker to run the

2768

# steps of the Cloud Dataflow job that will be assigned to its worker

2769

# pool.

2770

#

2771

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2772

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2773

# might use this to install jars containing the user's code and all of the

2774

# various dependencies (libraries, data files, etc.) required in order

2775

# for that code to run.

2776

"name": "A String", # The name of the package.

2777

"location": "A String", # The resource to read the package from. The supported resource type is:

2778

#

2779

# Google Cloud Storage:

2780

#

2781

# storage.googleapis.com/{bucket}

2782

# bucket.storage.googleapis.com/

2783

},

2784

],

2785

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2786

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2787

# `TEARDOWN_NEVER`.

2788

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2789

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2790

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2791

# down.

2792

#

2793

# If the workers are not torn down by the service, they will

2794

# continue to run and use Google Compute Engine VM resources in the

2795

# user's project until they are explicitly terminated by the user.

2796

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2797

# policy except for small, manually supervised test jobs.

2798

#

2799

# If unknown or unspecified, the service will attempt to choose a reasonable

2800

# default.

2801

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2802

# Compute Engine API.

2803

"poolArgs": { # Extra arguments for this worker pool.

2804

"a_key": "", # Properties of the object. Contains field @type with type URL.

2805

},

2806

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2807

# attempt to choose a reasonable default.

2808

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2809

# harness, residing in Google Container Registry.

2810

#

2811

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2812

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

2813

# attempt to choose a reasonable default.

2814

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2815

# service will attempt to choose a reasonable default.

2816

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2817

# are supported.

2818

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2819

# only be set in the Fn API path. For non-cross-language pipelines this

2820

# should have only one entry. Cross-language pipelines will have two or more

2821

# entries.

2822

{ # Defines a SDK harness container for executing Dataflow pipelines.

2823

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2824

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2825

# container instance with this image. If false (or unset) recommends using

2826

# more than one core per SDK container instance with this image for

2827

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2832

{ # Describes the data disk used by a workflow job.

2833

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2834

# must be a disk type appropriate to the project and zone in which

2835

# the workers will run. If unknown or unspecified, the service

2836

# will attempt to choose a reasonable default.

2837

#

2838

# For example, the standard persistent disk type is a resource name

2839

# typically ending in "pd-standard". If SSD persistent disks are

2840

# available, the resource name typically ends with "pd-ssd". The

2841

# actual valid values are defined the Google Compute Engine API,

2842

# not by the Cloud Dataflow API; consult the Google Compute Engine

2843

# documentation for more information about determining the set of

2844

# available disk types for a particular project and zone.

2845

#

2846

# Google Compute Engine Disk types are local to a particular

2847

# project in a particular zone, and so the resource name will

2848

# typically look something like this:

2849

#

2850

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2851

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2852

# attempt to choose a reasonable default.

2853

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2854

},

2855

],

2856

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2857

# the form "regions/REGION/subnetworks/SUBNETWORK".

2858

"ipConfiguration": "A String", # Configuration for VM IPs.

2859

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2860

# using the standard Dataflow task runner. Users should ignore

2861

# this field.

2862

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2863

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2864

# taskrunner; e.g. "wheel".

2865

"harnessCommand": "A String", # The command to launch the worker harness.

2866

"logDir": "A String", # The directory on the VM to store logs.

2867

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2868

# access the Cloud Dataflow API.

2869

"A String",

2870

],

2871

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2872

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2873

# will not be uploaded.

2874

#

2875

# The supported resource type is:

2876

#

2877

# Google Cloud Storage:

2878

# storage.googleapis.com/{bucket}/{object}

2879

# bucket.storage.googleapis.com/{object}

2880

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2881

"workflowFileName": "A String", # The file to store the workflow in.

2882

"languageHint": "A String", # The suggested backend language.

2883

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2884

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2885

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2886

# temporary storage.

2887

#

2888

# The supported resource type is:

2889

#

2890

# Google Cloud Storage:

2891

# storage.googleapis.com/{bucket}/{object}

2892

# bucket.storage.googleapis.com/{object}

2893

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2894

#

2895

# When workers access Google Cloud APIs, they logically do so via

2896

# relative URLs. If this field is specified, it supplies the base

2897

# URL to use for resolving these relative URLs. The normative

2898

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2899

# Locators".

2900

#

2901

# If not specified, the default value is "http://www.googleapis.com/"

2902

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2903

# console.

2904

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2905

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2906

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2907

# storage.

2908

#

2909

# The supported resource type is:

2910

#

2911

# Google Cloud Storage:

2912

#

2913

# storage.googleapis.com/{bucket}/{object}

2914

# bucket.storage.googleapis.com/{object}

2915

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2916

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2917

#

2918

# When workers access Google Cloud APIs, they logically do so via

2919

# relative URLs. If this field is specified, it supplies the base

2920

# URL to use for resolving these relative URLs. The normative

2921

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2922

# Locators".

2923

#

2924

# If not specified, the default value is "http://www.googleapis.com/"

2925

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2926

# "dataflow/v1b3/projects".

2927

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2928

# "shuffle/v1beta1".

2929

"workerId": "A String", # The ID of the worker running this pipeline.

2930

},

2931

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2932

# taskrunner; e.g. "root".

2933

"vmId": "A String", # The ID string of the VM.

2934

},

2935

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2936

"algorithm": "A String", # The algorithm to use for autoscaling.

2937

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2938

},

2939

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2940

"a_key": "A String",

2941

},

2942

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2943

# select a default set of packages which are useful to worker

2944

# harnesses written in a particular language.

2945

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2946

# the service will use the network "default".

2947

},

2948

],

2949

"dataset": "A String", # The dataset for the current project where various workflow

2950

# related tables are stored.

2951

#

2952

# The supported resource type is:

2953

#

2954

# Google BigQuery:

2955

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2956

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2957

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2958

# callers cannot mutate it.

2959

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2960

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2961

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2962

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2963

},

2964

],

2965

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2966

# by the metadata values provided here. Populated for ListJobs and all GetJob

2967

# views SUMMARY and higher.

2968

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2969

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2970

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2971

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2972

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2973

},

2974

],

2975

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2976

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2977

"sdkSupportStatus": "A String", # The support status for this SDK version.

2978

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2979

},

2980

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2981

{ # Metadata for a BigQuery connector used by the job.

2982

"table": "A String", # Table accessed in the connection.

2983

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2984

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2985

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2986

},

2987

],

2988

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2989

{ # Metadata for a File connector used by the job.

2990

"filePattern": "A String", # File Pattern used to access files by the connector.

2991

},

2992

],

2993

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2994

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2995

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2996

"subscription": "A String", # Subscription used in the connection.

2997

},

2998

],

2999

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3000

{ # Metadata for a BigTable connector used by the job.

3001

"projectId": "A String", # ProjectId accessed in the connection.

3002

"instanceId": "A String", # InstanceId accessed in the connection.

3003

"tableId": "A String", # TableId accessed in the connection.

3004

},

3005

],

3006

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3007

{ # Metadata for a Spanner connector used by the job.

3008

"instanceId": "A String", # InstanceId accessed in the connection.

3009

"projectId": "A String", # ProjectId accessed in the connection.

3010

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3011

},

3012

],

3013

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3014

"type": "A String", # The type of Cloud Dataflow job.

3015

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3016

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3017

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3018

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3019

# A description of the user pipeline and stages through which it is executed.

3020

# Created by Cloud Dataflow service. Only retrieved with

3021

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3022

# form. This data is provided by the Dataflow service for ease of visualizing

3023

# the pipeline and interpreting Dataflow provided metrics.

3024

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3025

{ # Description of the composing transforms, names/ids, and input/outputs of a

3026

# stage of execution. Some composing transforms and sources may have been

3027

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3028

"outputSource": [ # Output sources for this stage.

3029

{ # Description of an input or output of an execution stage.

3030

"sizeBytes": "A String", # Size of the source, if measurable.

3031

"name": "A String", # Dataflow service generated name for this source.

3032

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3033

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3034

# source is most closely associated.

3035

},

3036

],

3037

"name": "A String", # Dataflow service generated name for this stage.

3038

"inputSource": [ # Input sources for this stage.

3039

{ # Description of an input or output of an execution stage.

3040

"sizeBytes": "A String", # Size of the source, if measurable.

3041

"name": "A String", # Dataflow service generated name for this source.

3042

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3043

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3044

# source is most closely associated.

3045

},

3046

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3047

"id": "A String", # Dataflow service generated id for this stage.

3048

"componentTransform": [ # Transforms that comprise this execution stage.

3049

{ # Description of a transform executed as part of an execution stage.

3050

"originalTransform": "A String", # User name for the original user transform with which this transform is

3051

# most closely associated.

3052

"name": "A String", # Dataflow service generated name for this source.

3053

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3054

},

3055

],

3056

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3057

{ # Description of an interstitial value between transforms in an execution

3058

# stage.

3059

"name": "A String", # Dataflow service generated name for this source.

3060

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3061

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3062

# source is most closely associated.

3063

},

3064

],

3065

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3066

},

3067

],

3068

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3069

{ # Description of the type, names/ids, and input/outputs for a transform.

3070

"kind": "A String", # Type of transform.

3071

"inputCollectionName": [ # User names for all collection inputs to this transform.

3072

"A String",

3073

],

3074

"name": "A String", # User provided name for this transform instance.

3075

"id": "A String", # SDK generated id of this transform instance.

3076

"displayData": [ # Transform-specific display data.

3077

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3078

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3079

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3080

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3081

# language namespace (i.e. python module) which defines the display data.

3082

# This allows a dax monitoring system to specially handle the data

3083

# and perform custom rendering.

3084

"floatValue": 3.14, # Contains value if the data is of float type.

3085

"key": "A String", # The key identifying the display data.

3086

# This is intended to be used as a label for the display data

3087

# when viewed in a dax monitoring system.

3088

"shortStrValue": "A String", # A possible additional shorter value to display.

3089

# For example a java_class_name_value of com.mypackage.MyDoFn

3090

# will be stored with MyDoFn as the short_str_value and

3091

# com.mypackage.MyDoFn as the java_class_name value.

3092

# short_str_value can be displayed and java_class_name_value

3093

# will be displayed as a tooltip.

3094

"url": "A String", # An optional full URL.

3095

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3096

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3097

"boolValue": True or False, # Contains value if the data is of a boolean type.

3098

"javaClassValue": "A String", # Contains value if the data is of java class type.

3099

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3100

},

3101

],

3102

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3108

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3109

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3110

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3111

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3112

# language namespace (i.e. python module) which defines the display data.

3113

# This allows a dax monitoring system to specially handle the data

3114

# and perform custom rendering.

3115

"floatValue": 3.14, # Contains value if the data is of float type.

3116

"key": "A String", # The key identifying the display data.

3117

# This is intended to be used as a label for the display data

3118

# when viewed in a dax monitoring system.

3119

"shortStrValue": "A String", # A possible additional shorter value to display.

3120

# For example a java_class_name_value of com.mypackage.MyDoFn

3121

# will be stored with MyDoFn as the short_str_value and

3122

# com.mypackage.MyDoFn as the java_class_name value.

3123

# short_str_value can be displayed and java_class_name_value

3124

# will be displayed as a tooltip.

3125

"url": "A String", # An optional full URL.

3126

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3127

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3128

"boolValue": True or False, # Contains value if the data is of a boolean type.

3129

"javaClassValue": "A String", # Contains value if the data is of java class type.

3130

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3135

# of the job it replaced.

3136

#

3137

# When sending a `CreateJobRequest`, you can update a job by specifying it

3138

# here. The job named here is stopped, and its intermediate state is

3139

# transferred to this job.

3140

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3141

# for temporary storage. These temporary files will be

3142

# removed on job completion.

3143

# No duplicates are allowed.

3144

# No file patterns are supported.

3145

#

3146

# The supported files are:

3147

#

3148

# Google Cloud Storage:

3149

#

3150

# storage.googleapis.com/{bucket}/{object}

3151

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3152

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3153

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3154

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3155

#

3156

# Only one Job with a given name may exist in a project at any

3157

# given time. If a caller attempts to create a Job with the same

3158

# name as an already-existing Job, the attempt returns the

3159

# existing Job.

3160

#

3161

# The name must match the regular expression

3162

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3163

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3164

#

3165

# The top-level steps that constitute the entire job.

3166

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3167

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3168

# A job consists of multiple steps, each of which performs some

3169

# specific operation as part of the overall job. Data is typically

3170

# passed from one step to another as part of the job.

3171

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3172

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3173

# Map-Reduce job:

3174

#

3175

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3176

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3177

#

3178

# * Validate the elements.

3179

#

3180

# * Apply a user-defined function to map each element to some value

3181

# and extract an element-specific key value.

3182

#

3183

# * Group elements with the same key into a single element with

3184

# that key, transforming a multiply-keyed collection into a

3185

# uniquely-keyed collection.

3186

#

3187

# * Write the elements out to some data sink.

3188

#

3189

# Note that the Cloud Dataflow service may be used to run many different

3190

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3191

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3192

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3193

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3194

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3195

# predefined step has its own required set of properties.

3196

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3197

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3198

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3199

},

3200

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3201

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3202

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3203

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3204

# isn't contained in the submitted job.

3205

"stages": { # A mapping from each stage to the information about that stage.

3206

"a_key": { # Contains information about how a particular

3207

# google.dataflow.v1beta3.Step will be executed.

3208

"stepName": [ # The steps associated with the execution stage.

3209

# Note that stages may have several steps, and that a given step

3210

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3217

#

3218

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3219

# specified.

3220

#

3221

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3222

# terminal state. After a job has reached a terminal state, no

3223

# further state updates may be made.

3224

#

3225

# This field may be mutated by the Cloud Dataflow service;

3226

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3227

"location": "A String", # The [regional endpoint]

3228

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3229

# contains this job.

3230

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3231

# Flexible resource scheduling jobs are started with some delay after job

3232

# creation, so start_time is unset before start and is updated when the

3233

# job is started by the Cloud Dataflow service. For other jobs, start_time

3234

# always equals to create_time and is immutable and set by the Cloud Dataflow

3235

# service.

3236

"stepsLocation": "A String", # The GCS location where the steps are stored.

3237

"labels": { # User-defined labels for this job.

3238

#

3239

# The labels map can contain no more than 64 entries. Entries of the labels

3240

# map are UTF8 strings that comply with the following restrictions:

3241

#

3242

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3243

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3244

# * Both keys and values are additionally constrained to be <= 128 bytes in

3245

# size.

3246

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3247

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3248

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3249

# Cloud Dataflow service.

3250

"requestedState": "A String", # The job's requested state.

3251

#

3252

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3253

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3254

# also be used to directly set a job's requested state to

3255

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3256

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3257

},

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3258

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3259

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

3260

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3261

# failed to respond.

3262

{ # Indicates which [regional endpoint]

3263

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

3264

# to respond to a request for data.

3265

"name": "A String", # The name of the [regional endpoint]

3266

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

# failed to respond.

},

],

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

3276

<pre>Retrieves the next page of results.

3277

3278

Args:

3279

previous_request: The request for the previous page. (required)

3280

previous_response: The response from the request for the previous page. (required)

3281

3282

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3283

A request object that you can call 'execute()' on to request the next

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3284

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3289

<code class="details" id="snapshot">snapshot(projectId, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3290

<pre>Snapshot the state of a streaming job.

3291

3292

Args:

3293

projectId: string, The project which owns the job to be snapshotted. (required)

3294

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3295

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3296

The object takes the form of:

3297

3298

{ # Request to create a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3299

"description": "A String", # User specified description of the snapshot. Maybe empty.

3300

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

3301

"ttl": "A String", # TTL for the snapshot.

3302

"location": "A String", # The location that contains this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3303

}

3304

3305

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3312

3313

{ # Represents a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3314

"pubsubMetadata": [ # PubSub snapshot metadata.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3315

{ # Represents a Pubsub snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3316

"snapshotName": "A String", # The name of the Pubsub snapshot.

3317

"topicName": "A String", # The name of the Pubsub topic.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3318

"expireTime": "A String", # The expire time of the Pubsub snapshot.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3319

},

3320

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3321

"creationTime": "A String", # The time this snapshot was created.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3322

"sourceJobId": "A String", # The job this snapshot was created from.

3323

"state": "A String", # State of the snapshot.

3324

"projectId": "A String", # The project this snapshot belongs to.

3325

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

3326

"id": "A String", # The unique ID of this snapshot.

3327

"description": "A String", # User specified description of the snapshot. Maybe empty.

3328

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

3329

# state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3334

<code class="details" id="update">update(projectId, jobId, body=None, location=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3335

<pre>Updates the state of an existing Cloud Dataflow job.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3336

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3337

To update the state of an existing job, we recommend using

3338

`projects.locations.jobs.update` with a [regional endpoint]

3339

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

3340

`projects.jobs.update` is not recommended, as you can only update the state

3341

of jobs that are running in `us-central1`.

3342

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3343

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3344

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

3345

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3346

body: object, The request body.

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3347

The object takes the form of:

3348

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3349

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3350

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3351

# If this field is set, the service will ensure its uniqueness.

3352

# The request to create a job will fail if the service has knowledge of a

3353

# previously submitted job with the same client's ID and job name.

3354

# The caller may use this field to ensure idempotence of job

3355

# creation across retried attempts to create a job.

3356

# By default, the field is empty and, in that case, the service ignores it.

3357

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3358

#

3359

# This field is set by the Cloud Dataflow service when the Job is

3360

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3361

"currentStateTime": "A String", # The timestamp associated with the current state.

3362

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3363

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3364

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3365

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3366

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3367

"internalExperiments": { # Experimental settings.

3368

"a_key": "", # Properties of the object. Contains field @type with type URL.

3369

},

3370

"workerRegion": "A String", # The Compute Engine region

3371

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3372

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3373

# with worker_zone. If neither worker_region nor worker_zone is specified,

3374

# default to the control plane's region.

3375

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3376

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3377

#

3378

# Format:

3379

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3380

"userAgent": { # A description of the process that generated the request.

3381

"a_key": "", # Properties of the object.

3382

},

3383

"workerZone": "A String", # The Compute Engine zone

3384

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3385

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3386

# with worker_region. If neither worker_region nor worker_zone is specified,

3387

# a zone in the control plane's region is chosen based on available capacity.

3388

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3389

# unspecified, the service will attempt to choose a reasonable

3390

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3391

# e.g. "compute.googleapis.com".

3392

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3393

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3394

# this resource prefix, where {JOBNAME} is the value of the

3395

# job_name field. The resulting bucket and object prefix is used

3396

# as the prefix of the resources used to store temporary data

3397

# needed during the job execution. NOTE: This will override the

3398

# value in taskrunner_settings.

3399

# The supported resource type is:

3400

#

3401

# Google Cloud Storage:

3402

#

3403

# storage.googleapis.com/{bucket}/{object}

3404

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3405

"experiments": [ # The list of experiments to enable.

3406

"A String",

3407

],

3408

"version": { # A structure describing which components and their versions of the service

3409

# are required in order to run the job.

3410

"a_key": "", # Properties of the object.

3411

},

3412

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3413

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3414

# options are passed through the service and are used to recreate the

3415

# SDK pipeline options on the worker in a language agnostic and platform

3416

# independent way.

3417

"a_key": "", # Properties of the object.

3418

},

3419

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3420

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

3421

# specified in order for the job to have workers.

3422

{ # Describes one particular pool of Cloud Dataflow workers to be

3423

# instantiated by the Cloud Dataflow service in order to perform the

3424

# computations required by a job. Note that a workflow job may use

3425

# multiple pools, in order to match the various computational

3426

# requirements of the various stages of the job.

3427

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3428

# service will choose a number of threads (according to the number of cores

3429

# on the selected machine type for batch, or 1 by convention for streaming).

3430

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3431

# execute the job. If zero or unspecified, the service will

3432

# attempt to choose a reasonable default.

3433

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

3434

# will attempt to choose a reasonable default.

3435

"diskSourceImage": "A String", # Fully qualified source image for disks.

3436

"packages": [ # Packages to be installed on workers.

3437

{ # The packages that must be installed in order for a worker to run the

3438

# steps of the Cloud Dataflow job that will be assigned to its worker

3439

# pool.

3440

#

3441

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3442

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3443

# might use this to install jars containing the user's code and all of the

3444

# various dependencies (libraries, data files, etc.) required in order

3445

# for that code to run.

3446

"name": "A String", # The name of the package.

3447

"location": "A String", # The resource to read the package from. The supported resource type is:

3448

#

3449

# Google Cloud Storage:

3450

#

3451

# storage.googleapis.com/{bucket}

3452

# bucket.storage.googleapis.com/

3453

},

3454

],

3455

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3456

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3457

# `TEARDOWN_NEVER`.

3458

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3459

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3460

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3461

# down.

3462

#

3463

# If the workers are not torn down by the service, they will

3464

# continue to run and use Google Compute Engine VM resources in the

3465

# user's project until they are explicitly terminated by the user.

3466

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3467

# policy except for small, manually supervised test jobs.

3468

#

3469

# If unknown or unspecified, the service will attempt to choose a reasonable

3470

# default.

3471

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3472

# Compute Engine API.

3473

"poolArgs": { # Extra arguments for this worker pool.

3474

"a_key": "", # Properties of the object. Contains field @type with type URL.

3475

},

3476

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3477

# attempt to choose a reasonable default.

3478

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3479

# harness, residing in Google Container Registry.

3480

#

3481

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3482

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

3483

# attempt to choose a reasonable default.

3484

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3485

# service will attempt to choose a reasonable default.

3486

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3487

# are supported.

3488

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3489

# only be set in the Fn API path. For non-cross-language pipelines this

3490

# should have only one entry. Cross-language pipelines will have two or more

3491

# entries.

3492

{ # Defines a SDK harness container for executing Dataflow pipelines.

3493

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3494

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3495

# container instance with this image. If false (or unset) recommends using

3496

# more than one core per SDK container instance with this image for

3497

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3502

{ # Describes the data disk used by a workflow job.

3503

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3504

# must be a disk type appropriate to the project and zone in which

3505

# the workers will run. If unknown or unspecified, the service

3506

# will attempt to choose a reasonable default.

3507

#

3508

# For example, the standard persistent disk type is a resource name

3509

# typically ending in "pd-standard". If SSD persistent disks are

3510

# available, the resource name typically ends with "pd-ssd". The

3511

# actual valid values are defined the Google Compute Engine API,

3512

# not by the Cloud Dataflow API; consult the Google Compute Engine

3513

# documentation for more information about determining the set of

3514

# available disk types for a particular project and zone.

3515

#

3516

# Google Compute Engine Disk types are local to a particular

3517

# project in a particular zone, and so the resource name will

3518

# typically look something like this:

3519

#

3520

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

3521

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3522

# attempt to choose a reasonable default.

3523

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3524

},

3525

],

3526

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3527

# the form "regions/REGION/subnetworks/SUBNETWORK".

3528

"ipConfiguration": "A String", # Configuration for VM IPs.

3529

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3530

# using the standard Dataflow task runner. Users should ignore

3531

# this field.

3532

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3533

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3534

# taskrunner; e.g. "wheel".

3535

"harnessCommand": "A String", # The command to launch the worker harness.

3536

"logDir": "A String", # The directory on the VM to store logs.

3537

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3538

# access the Cloud Dataflow API.

3539

"A String",

3540

],

3541

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3542

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3543

# will not be uploaded.

3544

#

3545

# The supported resource type is:

3546

#

3547

# Google Cloud Storage:

3548

# storage.googleapis.com/{bucket}/{object}

3549

# bucket.storage.googleapis.com/{object}

3550

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3551

"workflowFileName": "A String", # The file to store the workflow in.

3552

"languageHint": "A String", # The suggested backend language.

3553

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3554

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3555

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3556

# temporary storage.

3557

#

3558

# The supported resource type is:

3559

#

3560

# Google Cloud Storage:

3561

# storage.googleapis.com/{bucket}/{object}

3562

# bucket.storage.googleapis.com/{object}

3563

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3564

#

3565

# When workers access Google Cloud APIs, they logically do so via

3566

# relative URLs. If this field is specified, it supplies the base

3567

# URL to use for resolving these relative URLs. The normative

3568

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3569

# Locators".

3570

#

3571

# If not specified, the default value is "http://www.googleapis.com/"

3572

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3573

# console.

3574

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3575

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3576

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3577

# storage.

3578

#

3579

# The supported resource type is:

3580

#

3581

# Google Cloud Storage:

3582

#

3583

# storage.googleapis.com/{bucket}/{object}

3584

# bucket.storage.googleapis.com/{object}

3585

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3586

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3587

#

3588

# When workers access Google Cloud APIs, they logically do so via

3589

# relative URLs. If this field is specified, it supplies the base

3590

# URL to use for resolving these relative URLs. The normative

3591

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3592

# Locators".

3593

#

3594

# If not specified, the default value is "http://www.googleapis.com/"

3595

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3596

# "dataflow/v1b3/projects".

3597

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3598

# "shuffle/v1beta1".

3599

"workerId": "A String", # The ID of the worker running this pipeline.

3600

},

3601

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3602

# taskrunner; e.g. "root".

3603

"vmId": "A String", # The ID string of the VM.

3604

},

3605

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3606

"algorithm": "A String", # The algorithm to use for autoscaling.

3607

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3608

},

3609

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3610

"a_key": "A String",

3611

},

3612

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3613

# select a default set of packages which are useful to worker

3614

# harnesses written in a particular language.

3615

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3616

# the service will use the network "default".

3617

},

3618

],

3619

"dataset": "A String", # The dataset for the current project where various workflow

3620

# related tables are stored.

3621

#

3622

# The supported resource type is:

3623

#

3624

# Google BigQuery:

3625

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3626

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3627

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3628

# callers cannot mutate it.

3629

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3630

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3631

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3632

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3633

},

3634

],

3635

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3636

# by the metadata values provided here. Populated for ListJobs and all GetJob

3637

# views SUMMARY and higher.

3638

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3639

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3640

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3641

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3642

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3643

},

3644

],

3645

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3646

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3647

"sdkSupportStatus": "A String", # The support status for this SDK version.

3648

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3649

},

3650

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3651

{ # Metadata for a BigQuery connector used by the job.

3652

"table": "A String", # Table accessed in the connection.

3653

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3654

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3655

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3656

},

3657

],

3658

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3659

{ # Metadata for a File connector used by the job.

3660

"filePattern": "A String", # File Pattern used to access files by the connector.

3661

},

3662

],

3663

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3664

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3665

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3666

"subscription": "A String", # Subscription used in the connection.

3667

},

3668

],

3669

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3670

{ # Metadata for a BigTable connector used by the job.

3671

"projectId": "A String", # ProjectId accessed in the connection.

3672

"instanceId": "A String", # InstanceId accessed in the connection.

3673

"tableId": "A String", # TableId accessed in the connection.

3674

},

3675

],

3676

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3677

{ # Metadata for a Spanner connector used by the job.

3678

"instanceId": "A String", # InstanceId accessed in the connection.

3679

"projectId": "A String", # ProjectId accessed in the connection.

3680

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3681

},

3682

],

3683

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3684

"type": "A String", # The type of Cloud Dataflow job.

3685

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3686

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3687

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3688

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3689

# A description of the user pipeline and stages through which it is executed.

3690

# Created by Cloud Dataflow service. Only retrieved with

3691

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3692

# form. This data is provided by the Dataflow service for ease of visualizing

3693

# the pipeline and interpreting Dataflow provided metrics.

3694

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3695

{ # Description of the composing transforms, names/ids, and input/outputs of a

3696

# stage of execution. Some composing transforms and sources may have been

3697

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3698

"outputSource": [ # Output sources for this stage.

3699

{ # Description of an input or output of an execution stage.

3700

"sizeBytes": "A String", # Size of the source, if measurable.

3701

"name": "A String", # Dataflow service generated name for this source.

3702

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3703

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3704

# source is most closely associated.

3705

},

3706

],

3707

"name": "A String", # Dataflow service generated name for this stage.

3708

"inputSource": [ # Input sources for this stage.

3709

{ # Description of an input or output of an execution stage.

3710

"sizeBytes": "A String", # Size of the source, if measurable.

3711

"name": "A String", # Dataflow service generated name for this source.

3712

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3713

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3714

# source is most closely associated.

3715

},

3716

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3717

"id": "A String", # Dataflow service generated id for this stage.

3718

"componentTransform": [ # Transforms that comprise this execution stage.

3719

{ # Description of a transform executed as part of an execution stage.

3720

"originalTransform": "A String", # User name for the original user transform with which this transform is

3721

# most closely associated.

3722

"name": "A String", # Dataflow service generated name for this source.

3723

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3724

},

3725

],

3726

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3727

{ # Description of an interstitial value between transforms in an execution

3728

# stage.

3729

"name": "A String", # Dataflow service generated name for this source.

3730

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3731

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3732

# source is most closely associated.

3733

},

3734

],

3735

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3736

},

3737

],

3738

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3739

{ # Description of the type, names/ids, and input/outputs for a transform.

3740

"kind": "A String", # Type of transform.

3741

"inputCollectionName": [ # User names for all collection inputs to this transform.

3742

"A String",

3743

],

3744

"name": "A String", # User provided name for this transform instance.

3745

"id": "A String", # SDK generated id of this transform instance.

3746

"displayData": [ # Transform-specific display data.

3747

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3748

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3749

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3750

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3751

# language namespace (i.e. python module) which defines the display data.

3752

# This allows a dax monitoring system to specially handle the data

3753

# and perform custom rendering.

3754

"floatValue": 3.14, # Contains value if the data is of float type.

3755

"key": "A String", # The key identifying the display data.

3756

# This is intended to be used as a label for the display data

3757

# when viewed in a dax monitoring system.

3758

"shortStrValue": "A String", # A possible additional shorter value to display.

3759

# For example a java_class_name_value of com.mypackage.MyDoFn

3760

# will be stored with MyDoFn as the short_str_value and

3761

# com.mypackage.MyDoFn as the java_class_name value.

3762

# short_str_value can be displayed and java_class_name_value

3763

# will be displayed as a tooltip.

3764

"url": "A String", # An optional full URL.

3765

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3766

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3767

"boolValue": True or False, # Contains value if the data is of a boolean type.

3768

"javaClassValue": "A String", # Contains value if the data is of java class type.

3769

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3770

},

3771

],

3772

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3778

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3779

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3780

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3781

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3782

# language namespace (i.e. python module) which defines the display data.

3783

# This allows a dax monitoring system to specially handle the data

3784

# and perform custom rendering.

3785

"floatValue": 3.14, # Contains value if the data is of float type.

3786

"key": "A String", # The key identifying the display data.

3787

# This is intended to be used as a label for the display data

3788

# when viewed in a dax monitoring system.

3789

"shortStrValue": "A String", # A possible additional shorter value to display.

3790

# For example a java_class_name_value of com.mypackage.MyDoFn

3791

# will be stored with MyDoFn as the short_str_value and

3792

# com.mypackage.MyDoFn as the java_class_name value.

3793

# short_str_value can be displayed and java_class_name_value

3794

# will be displayed as a tooltip.

3795

"url": "A String", # An optional full URL.

3796

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3797

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3798

"boolValue": True or False, # Contains value if the data is of a boolean type.

3799

"javaClassValue": "A String", # Contains value if the data is of java class type.

3800

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3805

# of the job it replaced.

3806

#

3807

# When sending a `CreateJobRequest`, you can update a job by specifying it

3808

# here. The job named here is stopped, and its intermediate state is

3809

# transferred to this job.

3810

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3811

# for temporary storage. These temporary files will be

3812

# removed on job completion.

3813

# No duplicates are allowed.

3814

# No file patterns are supported.

3815

#

3816

# The supported files are:

3817

#

3818

# Google Cloud Storage:

3819

#

3820

# storage.googleapis.com/{bucket}/{object}

3821

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3822

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3823

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3824

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3825

#

3826

# Only one Job with a given name may exist in a project at any

3827

# given time. If a caller attempts to create a Job with the same

3828

# name as an already-existing Job, the attempt returns the

3829

# existing Job.

3830

#

3831

# The name must match the regular expression

3832

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3833

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3834

#

3835

# The top-level steps that constitute the entire job.

3836

{ # Defines a particular step within a Cloud Dataflow job.

3837

#

3838

# A job consists of multiple steps, each of which performs some

3839

# specific operation as part of the overall job. Data is typically

3840

# passed from one step to another as part of the job.

3841

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3842

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3843

# Map-Reduce job:

3844

#

3845

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3846

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3847

#

3848

# * Validate the elements.

3849

#

3850

# * Apply a user-defined function to map each element to some value

3851

# and extract an element-specific key value.

3852

#

3853

# * Group elements with the same key into a single element with

3854

# that key, transforming a multiply-keyed collection into a

3855

# uniquely-keyed collection.

3856

#

3857

# * Write the elements out to some data sink.

3858

#

3859

# Note that the Cloud Dataflow service may be used to run many different

3860

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3861

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3862

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3863

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3864

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3865

# predefined step has its own required set of properties.

3866

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3867

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3868

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3869

},

3870

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3871

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3872

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3873

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3874

# isn't contained in the submitted job.

3875

"stages": { # A mapping from each stage to the information about that stage.

3876

"a_key": { # Contains information about how a particular

3877

# google.dataflow.v1beta3.Step will be executed.

3878

"stepName": [ # The steps associated with the execution stage.

3879

# Note that stages may have several steps, and that a given step

3880

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3887

#

3888

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3889

# specified.

3890

#

3891

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3892

# terminal state. After a job has reached a terminal state, no

3893

# further state updates may be made.

3894

#

3895

# This field may be mutated by the Cloud Dataflow service;

3896

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3897

"location": "A String", # The [regional endpoint]

3898

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3899

# contains this job.

3900

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3901

# Flexible resource scheduling jobs are started with some delay after job

3902

# creation, so start_time is unset before start and is updated when the

3903

# job is started by the Cloud Dataflow service. For other jobs, start_time

3904

# always equals to create_time and is immutable and set by the Cloud Dataflow

3905

# service.

3906

"stepsLocation": "A String", # The GCS location where the steps are stored.

3907

"labels": { # User-defined labels for this job.

3908

#

3909

# The labels map can contain no more than 64 entries. Entries of the labels

3910

# map are UTF8 strings that comply with the following restrictions:

3911

#

3912

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3913

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3914

# * Both keys and values are additionally constrained to be <= 128 bytes in

3915

# size.

3916

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3917

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3918

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3919

# Cloud Dataflow service.

3920

"requestedState": "A String", # The job's requested state.

3921

#

3922

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3923

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3924

# also be used to directly set a job's requested state to

3925

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3926

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3927

}

3928

3929

location: string, The [regional endpoint]

3930

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3931

contains this job.

3932

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3939

3940

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3941

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3942

# If this field is set, the service will ensure its uniqueness.

3943

# The request to create a job will fail if the service has knowledge of a

3944

# previously submitted job with the same client's ID and job name.

3945

# The caller may use this field to ensure idempotence of job

3946

# creation across retried attempts to create a job.

3947

# By default, the field is empty and, in that case, the service ignores it.

3948

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3949

#

3950

# This field is set by the Cloud Dataflow service when the Job is

3951

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3952

"currentStateTime": "A String", # The timestamp associated with the current state.

3953

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3954

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3955

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

3956

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3957

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3958

"internalExperiments": { # Experimental settings.

3959

"a_key": "", # Properties of the object. Contains field @type with type URL.

3960

},

3961

"workerRegion": "A String", # The Compute Engine region

3962

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3963

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3964

# with worker_zone. If neither worker_region nor worker_zone is specified,

3965

# default to the control plane's region.

3966

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3967

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3968

#

3969

# Format:

3970

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3971

"userAgent": { # A description of the process that generated the request.

3972

"a_key": "", # Properties of the object.

3973

},

3974

"workerZone": "A String", # The Compute Engine zone

3975

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3976

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3977

# with worker_region. If neither worker_region nor worker_zone is specified,

3978

# a zone in the control plane's region is chosen based on available capacity.

3979

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3980

# unspecified, the service will attempt to choose a reasonable

3981

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3982

# e.g. "compute.googleapis.com".

3983

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3984

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3985

# this resource prefix, where {JOBNAME} is the value of the

3986

# job_name field. The resulting bucket and object prefix is used

3987

# as the prefix of the resources used to store temporary data

3988

# needed during the job execution. NOTE: This will override the

3989

# value in taskrunner_settings.

3990

# The supported resource type is:

3991

#

3992

# Google Cloud Storage:

3993

#

3994

# storage.googleapis.com/{bucket}/{object}

3995

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3996

"experiments": [ # The list of experiments to enable.

3997

"A String",

3998

],

3999

"version": { # A structure describing which components and their versions of the service

4000

# are required in order to run the job.

4001

"a_key": "", # Properties of the object.

4002

},

4003

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4004

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

4005

# options are passed through the service and are used to recreate the

4006

# SDK pipeline options on the worker in a language agnostic and platform

4007

# independent way.

4008

"a_key": "", # Properties of the object.

4009

},

4010

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

4011

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

4012

# specified in order for the job to have workers.

4013

{ # Describes one particular pool of Cloud Dataflow workers to be

4014

# instantiated by the Cloud Dataflow service in order to perform the

4015

# computations required by a job. Note that a workflow job may use

4016

# multiple pools, in order to match the various computational

4017

# requirements of the various stages of the job.

4018

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

4019

# service will choose a number of threads (according to the number of cores

4020

# on the selected machine type for batch, or 1 by convention for streaming).

4021

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

4022

# execute the job. If zero or unspecified, the service will

4023

# attempt to choose a reasonable default.

4024

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

4025

# will attempt to choose a reasonable default.

4026

"diskSourceImage": "A String", # Fully qualified source image for disks.

4027

"packages": [ # Packages to be installed on workers.

4028

{ # The packages that must be installed in order for a worker to run the

4029

# steps of the Cloud Dataflow job that will be assigned to its worker

4030

# pool.

4031

#

4032

# This is the mechanism by which the Cloud Dataflow SDK causes code to

4033

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

4034

# might use this to install jars containing the user's code and all of the

4035

# various dependencies (libraries, data files, etc.) required in order

4036

# for that code to run.

4037

"name": "A String", # The name of the package.

4038

"location": "A String", # The resource to read the package from. The supported resource type is:

4039

#

4040

# Google Cloud Storage:

4041

#

4042

# storage.googleapis.com/{bucket}

4043

# bucket.storage.googleapis.com/

4044

},

4045

],

4046

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

4047

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

4048

# `TEARDOWN_NEVER`.

4049

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

4050

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

4051

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

4052

# down.

4053

#

4054

# If the workers are not torn down by the service, they will

4055

# continue to run and use Google Compute Engine VM resources in the

4056

# user's project until they are explicitly terminated by the user.

4057

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

4058

# policy except for small, manually supervised test jobs.

4059

#

4060

# If unknown or unspecified, the service will attempt to choose a reasonable

4061

# default.

4062

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

4063

# Compute Engine API.

4064

"poolArgs": { # Extra arguments for this worker pool.

4065

"a_key": "", # Properties of the object. Contains field @type with type URL.

4066

},

4067

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

4068

# attempt to choose a reasonable default.

4069

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

4070

# harness, residing in Google Container Registry.

4071

#

4072

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

4073

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

4074

# attempt to choose a reasonable default.

4075

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

4076

# service will attempt to choose a reasonable default.

4077

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

4078

# are supported.

4079

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

4080

# only be set in the Fn API path. For non-cross-language pipelines this

4081

# should have only one entry. Cross-language pipelines will have two or more

4082

# entries.

4083

{ # Defines a SDK harness container for executing Dataflow pipelines.

4084

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

4085

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

4086

# container instance with this image. If false (or unset) recommends using

4087

# more than one core per SDK container instance with this image for

4088

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

4093

{ # Describes the data disk used by a workflow job.

4094

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

4095

# must be a disk type appropriate to the project and zone in which

4096

# the workers will run. If unknown or unspecified, the service

4097

# will attempt to choose a reasonable default.

4098

#

4099

# For example, the standard persistent disk type is a resource name

4100

# typically ending in "pd-standard". If SSD persistent disks are

4101

# available, the resource name typically ends with "pd-ssd". The

4102

# actual valid values are defined the Google Compute Engine API,

4103

# not by the Cloud Dataflow API; consult the Google Compute Engine

4104

# documentation for more information about determining the set of

4105

# available disk types for a particular project and zone.

4106

#

4107

# Google Compute Engine Disk types are local to a particular

4108

# project in a particular zone, and so the resource name will

4109

# typically look something like this:

4110

#

4111

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

4112

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

4113

# attempt to choose a reasonable default.

4114

"mountPoint": "A String", # Directory in a VM where disk is mounted.

4115

},

4116

],

4117

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

4118

# the form "regions/REGION/subnetworks/SUBNETWORK".

4119

"ipConfiguration": "A String", # Configuration for VM IPs.

4120

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

4121

# using the standard Dataflow task runner. Users should ignore

4122

# this field.

4123

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

4124

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

4125

# taskrunner; e.g. "wheel".

4126

"harnessCommand": "A String", # The command to launch the worker harness.

4127

"logDir": "A String", # The directory on the VM to store logs.

4128

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

4129

# access the Cloud Dataflow API.

4130

"A String",

4131

],

4132

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

4133

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

4134

# will not be uploaded.

4135

#

4136

# The supported resource type is:

4137

#

4138

# Google Cloud Storage:

4139

# storage.googleapis.com/{bucket}/{object}

4140

# bucket.storage.googleapis.com/{object}

4141

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

4142

"workflowFileName": "A String", # The file to store the workflow in.

4143

"languageHint": "A String", # The suggested backend language.

4144

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

4145

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

4146

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

4147

# temporary storage.

4148

#

4149

# The supported resource type is:

4150

#

4151

# Google Cloud Storage:

4152

# storage.googleapis.com/{bucket}/{object}

4153

# bucket.storage.googleapis.com/{object}

4154

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

4155

#

4156

# When workers access Google Cloud APIs, they logically do so via

4157

# relative URLs. If this field is specified, it supplies the base

4158

# URL to use for resolving these relative URLs. The normative

4159

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4160

# Locators".

4161

#

4162

# If not specified, the default value is "http://www.googleapis.com/"

4163

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

4164

# console.

4165

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

4166

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

4167

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

4168

# storage.

4169

#

4170

# The supported resource type is:

4171

#

4172

# Google Cloud Storage:

4173

#

4174

# storage.googleapis.com/{bucket}/{object}

4175

# bucket.storage.googleapis.com/{object}

4176

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

4177

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

4178

#

4179

# When workers access Google Cloud APIs, they logically do so via

4180

# relative URLs. If this field is specified, it supplies the base

4181

# URL to use for resolving these relative URLs. The normative

4182

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

4183

# Locators".

4184

#

4185

# If not specified, the default value is "http://www.googleapis.com/"

4186

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

4187

# "dataflow/v1b3/projects".

4188

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

4189

# "shuffle/v1beta1".

4190

"workerId": "A String", # The ID of the worker running this pipeline.

4191

},

4192

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

4193

# taskrunner; e.g. "root".

4194

"vmId": "A String", # The ID string of the VM.

4195

},

4196

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

4197

"algorithm": "A String", # The algorithm to use for autoscaling.

4198

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

4199

},

4200

"metadata": { # Metadata to set on the Google Compute Engine VMs.

4201

"a_key": "A String",

4202

},

4203

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

4204

# select a default set of packages which are useful to worker

4205

# harnesses written in a particular language.

4206

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

4207

# the service will use the network "default".

4208

},

4209

],

4210

"dataset": "A String", # The dataset for the current project where various workflow

4211

# related tables are stored.

4212

#

4213

# The supported resource type is:

4214

#

4215

# Google BigQuery:

4216

# bigquery.googleapis.com/{dataset}

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4217

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4218

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

4219

# callers cannot mutate it.

4220

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4221

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

4222

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4223

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4224

},

4225

],

4226

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

4227

# by the metadata values provided here. Populated for ListJobs and all GetJob

4228

# views SUMMARY and higher.

4229

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4230

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

4231

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4232

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4233

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4234

},

4235

],

4236

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4237

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4238

"sdkSupportStatus": "A String", # The support status for this SDK version.

4239

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4240

},

4241

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

4242

{ # Metadata for a BigQuery connector used by the job.

4243

"table": "A String", # Table accessed in the connection.

4244

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4245

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4246

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4247

},

4248

],

4249

"fileDetails": [ # Identification of a File source used in the Dataflow job.

4250

{ # Metadata for a File connector used by the job.

4251

"filePattern": "A String", # File Pattern used to access files by the connector.

4252

},

4253

],

4254

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

4255

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4256

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4257

"subscription": "A String", # Subscription used in the connection.

4258

},

4259

],

4260

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

4261

{ # Metadata for a BigTable connector used by the job.

4262

"projectId": "A String", # ProjectId accessed in the connection.

4263

"instanceId": "A String", # InstanceId accessed in the connection.

4264

"tableId": "A String", # TableId accessed in the connection.

4265

},

4266

],

4267

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

4268

{ # Metadata for a Spanner connector used by the job.

4269

"instanceId": "A String", # InstanceId accessed in the connection.

4270

"projectId": "A String", # ProjectId accessed in the connection.

4271

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4272

},

4273

],

4274

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4275

"type": "A String", # The type of Cloud Dataflow job.

4276

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4277

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

4278

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4279

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

4280

# A description of the user pipeline and stages through which it is executed.

4281

# Created by Cloud Dataflow service. Only retrieved with

4282

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

4283

# form. This data is provided by the Dataflow service for ease of visualizing

4284

# the pipeline and interpreting Dataflow provided metrics.

4285

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

4286

{ # Description of the composing transforms, names/ids, and input/outputs of a

4287

# stage of execution. Some composing transforms and sources may have been

4288

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4289

"outputSource": [ # Output sources for this stage.

4290

{ # Description of an input or output of an execution stage.

4291

"sizeBytes": "A String", # Size of the source, if measurable.

4292

"name": "A String", # Dataflow service generated name for this source.

4293

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4294

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4295

# source is most closely associated.

4296

},

4297

],

4298

"name": "A String", # Dataflow service generated name for this stage.

4299

"inputSource": [ # Input sources for this stage.

4300

{ # Description of an input or output of an execution stage.

4301

"sizeBytes": "A String", # Size of the source, if measurable.

4302

"name": "A String", # Dataflow service generated name for this source.

4303

"userName": "A String", # Human-readable name for this source; may be user or system generated.

4304

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4305

# source is most closely associated.

4306

},

4307

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4308

"id": "A String", # Dataflow service generated id for this stage.

4309

"componentTransform": [ # Transforms that comprise this execution stage.

4310

{ # Description of a transform executed as part of an execution stage.

4311

"originalTransform": "A String", # User name for the original user transform with which this transform is

4312

# most closely associated.

4313

"name": "A String", # Dataflow service generated name for this source.

4314

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4315

},

4316

],

4317

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

4318

{ # Description of an interstitial value between transforms in an execution

4319

# stage.

4320

"name": "A String", # Dataflow service generated name for this source.

4321

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

4322

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

4323

# source is most closely associated.

4324

},

4325

],

4326

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4327

},

4328

],

4329

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

4330

{ # Description of the type, names/ids, and input/outputs for a transform.

4331

"kind": "A String", # Type of transform.

4332

"inputCollectionName": [ # User names for all collection inputs to this transform.

4333

"A String",

4334

],

4335

"name": "A String", # User provided name for this transform instance.

4336

"id": "A String", # SDK generated id of this transform instance.

4337

"displayData": [ # Transform-specific display data.

4338

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4339

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4340

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4341

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

4342

# language namespace (i.e. python module) which defines the display data.

4343

# This allows a dax monitoring system to specially handle the data

4344

# and perform custom rendering.

4345

"floatValue": 3.14, # Contains value if the data is of float type.

4346

"key": "A String", # The key identifying the display data.

4347

# This is intended to be used as a label for the display data

4348

# when viewed in a dax monitoring system.

4349

"shortStrValue": "A String", # A possible additional shorter value to display.

4350

# For example a java_class_name_value of com.mypackage.MyDoFn

4351

# will be stored with MyDoFn as the short_str_value and

4352

# com.mypackage.MyDoFn as the java_class_name value.

4353

# short_str_value can be displayed and java_class_name_value

4354

# will be displayed as a tooltip.

4355

"url": "A String", # An optional full URL.

4356

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4357

"timestampValue": "A String", # Contains value if the data is of timestamp type.

4358

"boolValue": True or False, # Contains value if the data is of a boolean type.

4359

"javaClassValue": "A String", # Contains value if the data is of java class type.

4360

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4361

},

4362

],

4363

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

4369

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4370

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4371

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4372

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

4373

# language namespace (i.e. python module) which defines the display data.

4374

# This allows a dax monitoring system to specially handle the data

4375

# and perform custom rendering.

4376

"floatValue": 3.14, # Contains value if the data is of float type.

4377

"key": "A String", # The key identifying the display data.

4378

# This is intended to be used as a label for the display data

4379

# when viewed in a dax monitoring system.

4380

"shortStrValue": "A String", # A possible additional shorter value to display.

4381

# For example a java_class_name_value of com.mypackage.MyDoFn

4382

# will be stored with MyDoFn as the short_str_value and

4383

# com.mypackage.MyDoFn as the java_class_name value.

4384

# short_str_value can be displayed and java_class_name_value

4385

# will be displayed as a tooltip.

4386

"url": "A String", # An optional full URL.

4387

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4388

"timestampValue": "A String", # Contains value if the data is of timestamp type.

4389

"boolValue": True or False, # Contains value if the data is of a boolean type.

4390

"javaClassValue": "A String", # Contains value if the data is of java class type.

4391

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

4396

# of the job it replaced.

4397

#

4398

# When sending a `CreateJobRequest`, you can update a job by specifying it

4399

# here. The job named here is stopped, and its intermediate state is

4400

# transferred to this job.

4401

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4402

# for temporary storage. These temporary files will be

4403

# removed on job completion.

4404

# No duplicates are allowed.

4405

# No file patterns are supported.

4406

#

4407

# The supported files are:

4408

#

4409

# Google Cloud Storage:

4410

#

4411

# storage.googleapis.com/{bucket}/{object}

4412

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4413

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4414

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4415

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4416

#

4417

# Only one Job with a given name may exist in a project at any

4418

# given time. If a caller attempts to create a Job with the same

4419

# name as an already-existing Job, the attempt returns the

4420

# existing Job.

4421

#

4422

# The name must match the regular expression

4423

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4424

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4425

#

4426

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4427

{ # Defines a particular step within a Cloud Dataflow job.

4428

#

4429

# A job consists of multiple steps, each of which performs some

4430

# specific operation as part of the overall job. Data is typically

4431

# passed from one step to another as part of the job.

4432

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4433

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4434

# Map-Reduce job:

4435

#

4436

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4437

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4438

#

4439

# * Validate the elements.

4440

#

4441

# * Apply a user-defined function to map each element to some value

4442

# and extract an element-specific key value.

4443

#

4444

# * Group elements with the same key into a single element with

4445

# that key, transforming a multiply-keyed collection into a

4446

# uniquely-keyed collection.

4447

#

4448

# * Write the elements out to some data sink.

4449

#

4450

# Note that the Cloud Dataflow service may be used to run many different

4451

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4452

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4453

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4454

"kind": "A String", # The kind of step in the Cloud Dataflow job.

4455

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

4456

# predefined step has its own required set of properties.

4457

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4458

"a_key": "", # Properties of the object.

Takashi Matsuo

2015-09-11 13:55:40 -0700

[diff] [blame]

4459

},

4460

},

4461

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4462

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

4463

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

4464

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

4465

# isn't contained in the submitted job.

4466

"stages": { # A mapping from each stage to the information about that stage.

4467

"a_key": { # Contains information about how a particular

4468

# google.dataflow.v1beta3.Step will be executed.

4469

"stepName": [ # The steps associated with the execution stage.

4470

# Note that stages may have several steps, and that a given step

4471

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4478

#

4479

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

4480

# specified.

4481

#

4482

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

4483

# terminal state. After a job has reached a terminal state, no

4484

# further state updates may be made.

4485

#

4486

# This field may be mutated by the Cloud Dataflow service;

4487

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4488

"location": "A String", # The [regional endpoint]

4489

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

4490

# contains this job.

4491

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

4492

# Flexible resource scheduling jobs are started with some delay after job

4493

# creation, so start_time is unset before start and is updated when the

4494

# job is started by the Cloud Dataflow service. For other jobs, start_time

4495

# always equals to create_time and is immutable and set by the Cloud Dataflow

4496

# service.

4497

"stepsLocation": "A String", # The GCS location where the steps are stored.

4498

"labels": { # User-defined labels for this job.

4499

#

4500

# The labels map can contain no more than 64 entries. Entries of the labels

4501

# map are UTF8 strings that comply with the following restrictions:

4502

#

4503

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

4504

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

4505

# * Both keys and values are additionally constrained to be <= 128 bytes in

4506

# size.

4507

"a_key": "A String",

Nathaniel Manista

2015-06-15 16:44:50 +0000

[diff] [blame]

4508

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4509

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

4510

# Cloud Dataflow service.

4511

"requestedState": "A String", # The job's requested state.

4512

#

4513

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

4514

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

4515

# also be used to directly set a job's requested state to

4516

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

4517

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4518

}</pre>

Nathaniel Manista