Blame - docs/dyn/dataflow_v1b3.projects.locations.jobs.html - platform/external/python/google-api-python-client

2017-01-06 09:58:29 -0800

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Sai Cheemalapati

4ba8c23

2017-06-06 18:46:08 -0400

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.locations.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.locations.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

88

<code><a href="dataflow_v1b3.projects.locations.jobs.snapshots.html">snapshots()</a></code>

89

</p>

90

<p class="firstline">Returns the snapshots Resource.</p>

91

92

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

93

<code><a href="dataflow_v1b3.projects.locations.jobs.workItems.html">workItems()</a></code>

94

</p>

95

<p class="firstline">Returns the workItems Resource.</p>

96

97

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

98

<code><a href="#create">create(projectId, location, body=None, replaceJobId=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

99

<p class="firstline">Creates a Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

100

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

101

<code><a href="#get">get(projectId, location, jobId, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

102

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

103

104

<code><a href="#getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</a></code></p>

105

<p class="firstline">Request the job status.</p>

106

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

107

<code><a href="#list">list(projectId, location, filter=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

108

<p class="firstline">List the jobs of a project.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

109

110

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

111

<p class="firstline">Retrieves the next page of results.</p>

112

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

113

<code><a href="#snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

114

<p class="firstline">Snapshot the state of a streaming job.</p>

115

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

116

<code><a href="#update">update(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

117

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

118

<h3>Method Details</h3>

119

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

120

<code class="details" id="create">create(projectId, location, body=None, replaceJobId=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

121

<pre>Creates a Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

122

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

123

To create a job, we recommend using `projects.locations.jobs.create` with a

124

[regional endpoint]

125

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

126

`projects.jobs.create` is not recommended, as your job will always start

127

in `us-central1`.

128

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

129

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

130

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

131

location: string, The [regional endpoint]

132

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

133

contains this job. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

134

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

135

The object takes the form of:

136

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

137

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

138

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

139

# If this field is set, the service will ensure its uniqueness.

140

# The request to create a job will fail if the service has knowledge of a

141

# previously submitted job with the same client's ID and job name.

142

# The caller may use this field to ensure idempotence of job

143

# creation across retried attempts to create a job.

144

# By default, the field is empty and, in that case, the service ignores it.

145

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

146

#

147

# This field is set by the Cloud Dataflow service when the Job is

148

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

149

"currentStateTime": "A String", # The timestamp associated with the current state.

150

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

151

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

152

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

153

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

154

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

155

"internalExperiments": { # Experimental settings.

156

"a_key": "", # Properties of the object. Contains field @type with type URL.

157

},

158

"workerRegion": "A String", # The Compute Engine region

159

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

160

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

161

# with worker_zone. If neither worker_region nor worker_zone is specified,

162

# default to the control plane's region.

163

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

164

# at rest, AKA a Customer Managed Encryption Key (CMEK).

165

#

166

# Format:

167

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

168

"userAgent": { # A description of the process that generated the request.

169

"a_key": "", # Properties of the object.

170

},

171

"workerZone": "A String", # The Compute Engine zone

172

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

173

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

174

# with worker_region. If neither worker_region nor worker_zone is specified,

175

# a zone in the control plane's region is chosen based on available capacity.

176

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

177

# unspecified, the service will attempt to choose a reasonable

178

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

179

# e.g. "compute.googleapis.com".

180

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

181

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

182

# this resource prefix, where {JOBNAME} is the value of the

183

# job_name field. The resulting bucket and object prefix is used

184

# as the prefix of the resources used to store temporary data

185

# needed during the job execution. NOTE: This will override the

186

# value in taskrunner_settings.

187

# The supported resource type is:

188

#

189

# Google Cloud Storage:

190

#

191

# storage.googleapis.com/{bucket}/{object}

192

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

193

"experiments": [ # The list of experiments to enable.

194

"A String",

195

],

196

"version": { # A structure describing which components and their versions of the service

197

# are required in order to run the job.

198

"a_key": "", # Properties of the object.

199

},

200

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

201

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

202

# options are passed through the service and are used to recreate the

203

# SDK pipeline options on the worker in a language agnostic and platform

204

# independent way.

205

"a_key": "", # Properties of the object.

206

},

207

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

208

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

209

# specified in order for the job to have workers.

210

{ # Describes one particular pool of Cloud Dataflow workers to be

211

# instantiated by the Cloud Dataflow service in order to perform the

212

# computations required by a job. Note that a workflow job may use

213

# multiple pools, in order to match the various computational

214

# requirements of the various stages of the job.

215

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

216

# service will choose a number of threads (according to the number of cores

217

# on the selected machine type for batch, or 1 by convention for streaming).

218

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

219

# execute the job. If zero or unspecified, the service will

220

# attempt to choose a reasonable default.

221

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

222

# will attempt to choose a reasonable default.

223

"diskSourceImage": "A String", # Fully qualified source image for disks.

224

"packages": [ # Packages to be installed on workers.

225

{ # The packages that must be installed in order for a worker to run the

226

# steps of the Cloud Dataflow job that will be assigned to its worker

227

# pool.

228

#

229

# This is the mechanism by which the Cloud Dataflow SDK causes code to

230

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

231

# might use this to install jars containing the user's code and all of the

232

# various dependencies (libraries, data files, etc.) required in order

233

# for that code to run.

234

"name": "A String", # The name of the package.

235

"location": "A String", # The resource to read the package from. The supported resource type is:

236

#

237

# Google Cloud Storage:

238

#

239

# storage.googleapis.com/{bucket}

240

# bucket.storage.googleapis.com/

241

},

242

],

243

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

244

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

245

# `TEARDOWN_NEVER`.

246

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

247

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

248

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

249

# down.

250

#

251

# If the workers are not torn down by the service, they will

252

# continue to run and use Google Compute Engine VM resources in the

253

# user's project until they are explicitly terminated by the user.

254

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

255

# policy except for small, manually supervised test jobs.

256

#

257

# If unknown or unspecified, the service will attempt to choose a reasonable

258

# default.

259

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

260

# Compute Engine API.

261

"poolArgs": { # Extra arguments for this worker pool.

262

"a_key": "", # Properties of the object. Contains field @type with type URL.

263

},

264

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

265

# attempt to choose a reasonable default.

266

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

267

# harness, residing in Google Container Registry.

268

#

269

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

270

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

271

# attempt to choose a reasonable default.

272

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

273

# service will attempt to choose a reasonable default.

274

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

275

# are supported.

276

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

277

# only be set in the Fn API path. For non-cross-language pipelines this

278

# should have only one entry. Cross-language pipelines will have two or more

279

# entries.

280

{ # Defines a SDK harness container for executing Dataflow pipelines.

281

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

282

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

283

# container instance with this image. If false (or unset) recommends using

284

# more than one core per SDK container instance with this image for

285

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

290

{ # Describes the data disk used by a workflow job.

291

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

292

# must be a disk type appropriate to the project and zone in which

293

# the workers will run. If unknown or unspecified, the service

294

# will attempt to choose a reasonable default.

295

#

296

# For example, the standard persistent disk type is a resource name

297

# typically ending in "pd-standard". If SSD persistent disks are

298

# available, the resource name typically ends with "pd-ssd". The

299

# actual valid values are defined the Google Compute Engine API,

300

# not by the Cloud Dataflow API; consult the Google Compute Engine

301

# documentation for more information about determining the set of

302

# available disk types for a particular project and zone.

303

#

304

# Google Compute Engine Disk types are local to a particular

305

# project in a particular zone, and so the resource name will

306

# typically look something like this:

307

#

308

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

309

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

310

# attempt to choose a reasonable default.

311

"mountPoint": "A String", # Directory in a VM where disk is mounted.

312

},

313

],

314

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

315

# the form "regions/REGION/subnetworks/SUBNETWORK".

316

"ipConfiguration": "A String", # Configuration for VM IPs.

317

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

318

# using the standard Dataflow task runner. Users should ignore

319

# this field.

320

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

321

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

322

# taskrunner; e.g. "wheel".

323

"harnessCommand": "A String", # The command to launch the worker harness.

324

"logDir": "A String", # The directory on the VM to store logs.

325

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

326

# access the Cloud Dataflow API.

327

"A String",

328

],

329

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

330

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

331

# will not be uploaded.

332

#

333

# The supported resource type is:

334

#

335

# Google Cloud Storage:

336

# storage.googleapis.com/{bucket}/{object}

337

# bucket.storage.googleapis.com/{object}

338

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

339

"workflowFileName": "A String", # The file to store the workflow in.

340

"languageHint": "A String", # The suggested backend language.

341

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

342

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

343

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

344

# temporary storage.

345

#

346

# The supported resource type is:

347

#

348

# Google Cloud Storage:

349

# storage.googleapis.com/{bucket}/{object}

350

# bucket.storage.googleapis.com/{object}

351

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

352

#

353

# When workers access Google Cloud APIs, they logically do so via

354

# relative URLs. If this field is specified, it supplies the base

355

# URL to use for resolving these relative URLs. The normative

356

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

357

# Locators".

358

#

359

# If not specified, the default value is "http://www.googleapis.com/"

360

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

361

# console.

362

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

363

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

364

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

365

# storage.

366

#

367

# The supported resource type is:

368

#

369

# Google Cloud Storage:

370

#

371

# storage.googleapis.com/{bucket}/{object}

372

# bucket.storage.googleapis.com/{object}

373

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

374

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

375

#

376

# When workers access Google Cloud APIs, they logically do so via

377

# relative URLs. If this field is specified, it supplies the base

378

# URL to use for resolving these relative URLs. The normative

379

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

380

# Locators".

381

#

382

# If not specified, the default value is "http://www.googleapis.com/"

383

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

384

# "dataflow/v1b3/projects".

385

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

386

# "shuffle/v1beta1".

387

"workerId": "A String", # The ID of the worker running this pipeline.

388

},

389

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

390

# taskrunner; e.g. "root".

391

"vmId": "A String", # The ID string of the VM.

392

},

393

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

394

"algorithm": "A String", # The algorithm to use for autoscaling.

395

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

396

},

397

"metadata": { # Metadata to set on the Google Compute Engine VMs.

398

"a_key": "A String",

399

},

400

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

401

# select a default set of packages which are useful to worker

402

# harnesses written in a particular language.

403

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

404

# the service will use the network "default".

405

},

406

],

407

"dataset": "A String", # The dataset for the current project where various workflow

408

# related tables are stored.

409

#

410

# The supported resource type is:

411

#

412

# Google BigQuery:

413

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

414

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

415

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

416

# callers cannot mutate it.

417

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

418

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

419

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

420

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

421

},

422

],

423

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

424

# by the metadata values provided here. Populated for ListJobs and all GetJob

425

# views SUMMARY and higher.

426

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

427

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

428

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

429

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

430

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

431

},

432

],

433

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

434

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

435

"sdkSupportStatus": "A String", # The support status for this SDK version.

436

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

437

},

438

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

439

{ # Metadata for a BigQuery connector used by the job.

440

"table": "A String", # Table accessed in the connection.

441

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

442

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

443

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

444

},

445

],

446

"fileDetails": [ # Identification of a File source used in the Dataflow job.

447

{ # Metadata for a File connector used by the job.

448

"filePattern": "A String", # File Pattern used to access files by the connector.

449

},

450

],

451

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

452

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

453

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

454

"subscription": "A String", # Subscription used in the connection.

455

},

456

],

457

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

458

{ # Metadata for a BigTable connector used by the job.

459

"projectId": "A String", # ProjectId accessed in the connection.

460

"instanceId": "A String", # InstanceId accessed in the connection.

461

"tableId": "A String", # TableId accessed in the connection.

462

},

463

],

464

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

465

{ # Metadata for a Spanner connector used by the job.

466

"instanceId": "A String", # InstanceId accessed in the connection.

467

"projectId": "A String", # ProjectId accessed in the connection.

468

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

469

},

470

],

471

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

472

"type": "A String", # The type of Cloud Dataflow job.

473

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

474

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

475

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

476

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

477

# A description of the user pipeline and stages through which it is executed.

478

# Created by Cloud Dataflow service. Only retrieved with

479

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

480

# form. This data is provided by the Dataflow service for ease of visualizing

481

# the pipeline and interpreting Dataflow provided metrics.

482

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

483

{ # Description of the composing transforms, names/ids, and input/outputs of a

484

# stage of execution. Some composing transforms and sources may have been

485

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

486

"outputSource": [ # Output sources for this stage.

487

{ # Description of an input or output of an execution stage.

488

"sizeBytes": "A String", # Size of the source, if measurable.

489

"name": "A String", # Dataflow service generated name for this source.

490

"userName": "A String", # Human-readable name for this source; may be user or system generated.

491

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

492

# source is most closely associated.

493

},

494

],

495

"name": "A String", # Dataflow service generated name for this stage.

496

"inputSource": [ # Input sources for this stage.

497

{ # Description of an input or output of an execution stage.

498

"sizeBytes": "A String", # Size of the source, if measurable.

499

"name": "A String", # Dataflow service generated name for this source.

500

"userName": "A String", # Human-readable name for this source; may be user or system generated.

501

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

502

# source is most closely associated.

503

},

504

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

505

"id": "A String", # Dataflow service generated id for this stage.

506

"componentTransform": [ # Transforms that comprise this execution stage.

507

{ # Description of a transform executed as part of an execution stage.

508

"originalTransform": "A String", # User name for the original user transform with which this transform is

509

# most closely associated.

510

"name": "A String", # Dataflow service generated name for this source.

511

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

512

},

513

],

514

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

515

{ # Description of an interstitial value between transforms in an execution

516

# stage.

517

"name": "A String", # Dataflow service generated name for this source.

518

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

519

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

520

# source is most closely associated.

521

},

522

],

523

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

524

},

525

],

526

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

527

{ # Description of the type, names/ids, and input/outputs for a transform.

528

"kind": "A String", # Type of transform.

529

"inputCollectionName": [ # User names for all collection inputs to this transform.

530

"A String",

531

],

532

"name": "A String", # User provided name for this transform instance.

533

"id": "A String", # SDK generated id of this transform instance.

534

"displayData": [ # Transform-specific display data.

535

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

536

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

537

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

538

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

539

# language namespace (i.e. python module) which defines the display data.

540

# This allows a dax monitoring system to specially handle the data

541

# and perform custom rendering.

542

"floatValue": 3.14, # Contains value if the data is of float type.

543

"key": "A String", # The key identifying the display data.

544

# This is intended to be used as a label for the display data

545

# when viewed in a dax monitoring system.

546

"shortStrValue": "A String", # A possible additional shorter value to display.

547

# For example a java_class_name_value of com.mypackage.MyDoFn

548

# will be stored with MyDoFn as the short_str_value and

549

# com.mypackage.MyDoFn as the java_class_name value.

550

# short_str_value can be displayed and java_class_name_value

551

# will be displayed as a tooltip.

552

"url": "A String", # An optional full URL.

553

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

554

"timestampValue": "A String", # Contains value if the data is of timestamp type.

555

"boolValue": True or False, # Contains value if the data is of a boolean type.

556

"javaClassValue": "A String", # Contains value if the data is of java class type.

557

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

558

},

559

],

560

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

566

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

567

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

568

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

569

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

570

# language namespace (i.e. python module) which defines the display data.

571

# This allows a dax monitoring system to specially handle the data

572

# and perform custom rendering.

573

"floatValue": 3.14, # Contains value if the data is of float type.

574

"key": "A String", # The key identifying the display data.

575

# This is intended to be used as a label for the display data

576

# when viewed in a dax monitoring system.

577

"shortStrValue": "A String", # A possible additional shorter value to display.

578

# For example a java_class_name_value of com.mypackage.MyDoFn

579

# will be stored with MyDoFn as the short_str_value and

580

# com.mypackage.MyDoFn as the java_class_name value.

581

# short_str_value can be displayed and java_class_name_value

582

# will be displayed as a tooltip.

583

"url": "A String", # An optional full URL.

584

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

585

"timestampValue": "A String", # Contains value if the data is of timestamp type.

586

"boolValue": True or False, # Contains value if the data is of a boolean type.

587

"javaClassValue": "A String", # Contains value if the data is of java class type.

588

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

593

# of the job it replaced.

594

#

595

# When sending a `CreateJobRequest`, you can update a job by specifying it

596

# here. The job named here is stopped, and its intermediate state is

597

# transferred to this job.

598

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

599

# for temporary storage. These temporary files will be

600

# removed on job completion.

601

# No duplicates are allowed.

602

# No file patterns are supported.

603

#

604

# The supported files are:

605

#

606

# Google Cloud Storage:

607

#

608

# storage.googleapis.com/{bucket}/{object}

609

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

610

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

611

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

612

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

613

#

614

# Only one Job with a given name may exist in a project at any

615

# given time. If a caller attempts to create a Job with the same

616

# name as an already-existing Job, the attempt returns the

617

# existing Job.

618

#

619

# The name must match the regular expression

620

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

621

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

622

#

623

# The top-level steps that constitute the entire job.

624

{ # Defines a particular step within a Cloud Dataflow job.

625

#

626

# A job consists of multiple steps, each of which performs some

627

# specific operation as part of the overall job. Data is typically

628

# passed from one step to another as part of the job.

629

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

630

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

631

# Map-Reduce job:

632

#

633

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

634

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

635

#

636

# * Validate the elements.

637

#

638

# * Apply a user-defined function to map each element to some value

639

# and extract an element-specific key value.

640

#

641

# * Group elements with the same key into a single element with

642

# that key, transforming a multiply-keyed collection into a

643

# uniquely-keyed collection.

644

#

645

# * Write the elements out to some data sink.

646

#

647

# Note that the Cloud Dataflow service may be used to run many different

648

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

649

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

650

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

651

"kind": "A String", # The kind of step in the Cloud Dataflow job.

652

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

653

# predefined step has its own required set of properties.

654

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

655

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

656

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

657

},

658

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

659

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

660

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

661

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

662

# isn't contained in the submitted job.

663

"stages": { # A mapping from each stage to the information about that stage.

664

"a_key": { # Contains information about how a particular

665

# google.dataflow.v1beta3.Step will be executed.

666

"stepName": [ # The steps associated with the execution stage.

667

# Note that stages may have several steps, and that a given step

668

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

675

#

676

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

677

# specified.

678

#

679

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

680

# terminal state. After a job has reached a terminal state, no

681

# further state updates may be made.

682

#

683

# This field may be mutated by the Cloud Dataflow service;

684

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

685

"location": "A String", # The [regional endpoint]

686

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

687

# contains this job.

688

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

689

# Flexible resource scheduling jobs are started with some delay after job

690

# creation, so start_time is unset before start and is updated when the

691

# job is started by the Cloud Dataflow service. For other jobs, start_time

692

# always equals to create_time and is immutable and set by the Cloud Dataflow

693

# service.

694

"stepsLocation": "A String", # The GCS location where the steps are stored.

695

"labels": { # User-defined labels for this job.

696

#

697

# The labels map can contain no more than 64 entries. Entries of the labels

698

# map are UTF8 strings that comply with the following restrictions:

699

#

700

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

701

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

702

# * Both keys and values are additionally constrained to be <= 128 bytes in

703

# size.

704

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

705

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

706

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

707

# Cloud Dataflow service.

708

"requestedState": "A String", # The job's requested state.

709

#

710

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

711

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

712

# also be used to directly set a job's requested state to

713

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

714

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

715

}

716

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

717

replaceJobId: string, Deprecated. This field is now in the Job message.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

718

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

719

x__xgafv: string, V1 error format.

720

Allowed values

721

1 - v1 error format

722

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

723

724

Returns:

725

An object of the form:

726

727

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

728

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

729

# If this field is set, the service will ensure its uniqueness.

730

# The request to create a job will fail if the service has knowledge of a

731

# previously submitted job with the same client's ID and job name.

732

# The caller may use this field to ensure idempotence of job

733

# creation across retried attempts to create a job.

734

# By default, the field is empty and, in that case, the service ignores it.

735

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

736

#

737

# This field is set by the Cloud Dataflow service when the Job is

738

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

739

"currentStateTime": "A String", # The timestamp associated with the current state.

740

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

741

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

742

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

743

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

744

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

745

"internalExperiments": { # Experimental settings.

746

"a_key": "", # Properties of the object. Contains field @type with type URL.

747

},

748

"workerRegion": "A String", # The Compute Engine region

749

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

750

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

751

# with worker_zone. If neither worker_region nor worker_zone is specified,

752

# default to the control plane's region.

753

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

754

# at rest, AKA a Customer Managed Encryption Key (CMEK).

755

#

756

# Format:

757

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

758

"userAgent": { # A description of the process that generated the request.

759

"a_key": "", # Properties of the object.

760

},

761

"workerZone": "A String", # The Compute Engine zone

762

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

763

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

764

# with worker_region. If neither worker_region nor worker_zone is specified,

765

# a zone in the control plane's region is chosen based on available capacity.

766

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

767

# unspecified, the service will attempt to choose a reasonable

768

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

769

# e.g. "compute.googleapis.com".

770

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

771

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

772

# this resource prefix, where {JOBNAME} is the value of the

773

# job_name field. The resulting bucket and object prefix is used

774

# as the prefix of the resources used to store temporary data

775

# needed during the job execution. NOTE: This will override the

776

# value in taskrunner_settings.

777

# The supported resource type is:

778

#

779

# Google Cloud Storage:

780

#

781

# storage.googleapis.com/{bucket}/{object}

782

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

783

"experiments": [ # The list of experiments to enable.

784

"A String",

785

],

786

"version": { # A structure describing which components and their versions of the service

787

# are required in order to run the job.

788

"a_key": "", # Properties of the object.

789

},

790

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

791

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

792

# options are passed through the service and are used to recreate the

793

# SDK pipeline options on the worker in a language agnostic and platform

794

# independent way.

795

"a_key": "", # Properties of the object.

796

},

797

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

798

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

799

# specified in order for the job to have workers.

800

{ # Describes one particular pool of Cloud Dataflow workers to be

801

# instantiated by the Cloud Dataflow service in order to perform the

802

# computations required by a job. Note that a workflow job may use

803

# multiple pools, in order to match the various computational

804

# requirements of the various stages of the job.

805

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

806

# service will choose a number of threads (according to the number of cores

807

# on the selected machine type for batch, or 1 by convention for streaming).

808

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

809

# execute the job. If zero or unspecified, the service will

810

# attempt to choose a reasonable default.

811

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

812

# will attempt to choose a reasonable default.

813

"diskSourceImage": "A String", # Fully qualified source image for disks.

814

"packages": [ # Packages to be installed on workers.

815

{ # The packages that must be installed in order for a worker to run the

816

# steps of the Cloud Dataflow job that will be assigned to its worker

817

# pool.

818

#

819

# This is the mechanism by which the Cloud Dataflow SDK causes code to

820

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

821

# might use this to install jars containing the user's code and all of the

822

# various dependencies (libraries, data files, etc.) required in order

823

# for that code to run.

824

"name": "A String", # The name of the package.

825

"location": "A String", # The resource to read the package from. The supported resource type is:

826

#

827

# Google Cloud Storage:

828

#

829

# storage.googleapis.com/{bucket}

830

# bucket.storage.googleapis.com/

831

},

832

],

833

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

834

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

835

# `TEARDOWN_NEVER`.

836

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

837

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

838

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

839

# down.

840

#

841

# If the workers are not torn down by the service, they will

842

# continue to run and use Google Compute Engine VM resources in the

843

# user's project until they are explicitly terminated by the user.

844

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

845

# policy except for small, manually supervised test jobs.

846

#

847

# If unknown or unspecified, the service will attempt to choose a reasonable

848

# default.

849

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

850

# Compute Engine API.

851

"poolArgs": { # Extra arguments for this worker pool.

852

"a_key": "", # Properties of the object. Contains field @type with type URL.

853

},

854

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

855

# attempt to choose a reasonable default.

856

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

857

# harness, residing in Google Container Registry.

858

#

859

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

860

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

861

# attempt to choose a reasonable default.

862

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

863

# service will attempt to choose a reasonable default.

864

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

865

# are supported.

866

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

867

# only be set in the Fn API path. For non-cross-language pipelines this

868

# should have only one entry. Cross-language pipelines will have two or more

869

# entries.

870

{ # Defines a SDK harness container for executing Dataflow pipelines.

871

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

872

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

873

# container instance with this image. If false (or unset) recommends using

874

# more than one core per SDK container instance with this image for

875

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

880

{ # Describes the data disk used by a workflow job.

881

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

882

# must be a disk type appropriate to the project and zone in which

883

# the workers will run. If unknown or unspecified, the service

884

# will attempt to choose a reasonable default.

885

#

886

# For example, the standard persistent disk type is a resource name

887

# typically ending in "pd-standard". If SSD persistent disks are

888

# available, the resource name typically ends with "pd-ssd". The

889

# actual valid values are defined the Google Compute Engine API,

890

# not by the Cloud Dataflow API; consult the Google Compute Engine

891

# documentation for more information about determining the set of

892

# available disk types for a particular project and zone.

893

#

894

# Google Compute Engine Disk types are local to a particular

895

# project in a particular zone, and so the resource name will

896

# typically look something like this:

897

#

898

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

899

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

900

# attempt to choose a reasonable default.

901

"mountPoint": "A String", # Directory in a VM where disk is mounted.

902

},

903

],

904

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

905

# the form "regions/REGION/subnetworks/SUBNETWORK".

906

"ipConfiguration": "A String", # Configuration for VM IPs.

907

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

908

# using the standard Dataflow task runner. Users should ignore

909

# this field.

910

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

911

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

912

# taskrunner; e.g. "wheel".

913

"harnessCommand": "A String", # The command to launch the worker harness.

914

"logDir": "A String", # The directory on the VM to store logs.

915

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

916

# access the Cloud Dataflow API.

917

"A String",

918

],

919

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

920

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

921

# will not be uploaded.

922

#

923

# The supported resource type is:

924

#

925

# Google Cloud Storage:

926

# storage.googleapis.com/{bucket}/{object}

927

# bucket.storage.googleapis.com/{object}

928

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

929

"workflowFileName": "A String", # The file to store the workflow in.

930

"languageHint": "A String", # The suggested backend language.

931

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

932

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

933

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

934

# temporary storage.

935

#

936

# The supported resource type is:

937

#

938

# Google Cloud Storage:

939

# storage.googleapis.com/{bucket}/{object}

940

# bucket.storage.googleapis.com/{object}

941

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

942

#

943

# When workers access Google Cloud APIs, they logically do so via

944

# relative URLs. If this field is specified, it supplies the base

945

# URL to use for resolving these relative URLs. The normative

946

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

947

# Locators".

948

#

949

# If not specified, the default value is "http://www.googleapis.com/"

950

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

951

# console.

952

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

953

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

954

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

955

# storage.

956

#

957

# The supported resource type is:

958

#

959

# Google Cloud Storage:

960

#

961

# storage.googleapis.com/{bucket}/{object}

962

# bucket.storage.googleapis.com/{object}

963

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

964

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

965

#

966

# When workers access Google Cloud APIs, they logically do so via

967

# relative URLs. If this field is specified, it supplies the base

968

# URL to use for resolving these relative URLs. The normative

969

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

970

# Locators".

971

#

972

# If not specified, the default value is "http://www.googleapis.com/"

973

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

974

# "dataflow/v1b3/projects".

975

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

976

# "shuffle/v1beta1".

977

"workerId": "A String", # The ID of the worker running this pipeline.

978

},

979

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

980

# taskrunner; e.g. "root".

981

"vmId": "A String", # The ID string of the VM.

982

},

983

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

984

"algorithm": "A String", # The algorithm to use for autoscaling.

985

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

986

},

987

"metadata": { # Metadata to set on the Google Compute Engine VMs.

988

"a_key": "A String",

989

},

990

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

991

# select a default set of packages which are useful to worker

992

# harnesses written in a particular language.

993

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

994

# the service will use the network "default".

995

},

996

],

997

"dataset": "A String", # The dataset for the current project where various workflow

998

# related tables are stored.

999

#

1000

# The supported resource type is:

1001

#

1002

# Google BigQuery:

1003

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1004

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1005

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1006

# callers cannot mutate it.

1007

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1008

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1009

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1010

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1011

},

1012

],

1013

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1014

# by the metadata values provided here. Populated for ListJobs and all GetJob

1015

# views SUMMARY and higher.

1016

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1017

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1018

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1019

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1020

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1021

},

1022

],

1023

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1024

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1025

"sdkSupportStatus": "A String", # The support status for this SDK version.

1026

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1027

},

1028

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1029

{ # Metadata for a BigQuery connector used by the job.

1030

"table": "A String", # Table accessed in the connection.

1031

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1032

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1033

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1034

},

1035

],

1036

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1037

{ # Metadata for a File connector used by the job.

1038

"filePattern": "A String", # File Pattern used to access files by the connector.

1039

},

1040

],

1041

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1042

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1043

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1044

"subscription": "A String", # Subscription used in the connection.

1045

},

1046

],

1047

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1048

{ # Metadata for a BigTable connector used by the job.

1049

"projectId": "A String", # ProjectId accessed in the connection.

1050

"instanceId": "A String", # InstanceId accessed in the connection.

1051

"tableId": "A String", # TableId accessed in the connection.

1052

},

1053

],

1054

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1055

{ # Metadata for a Spanner connector used by the job.

1056

"instanceId": "A String", # InstanceId accessed in the connection.

1057

"projectId": "A String", # ProjectId accessed in the connection.

1058

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1059

},

1060

],

1061

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1062

"type": "A String", # The type of Cloud Dataflow job.

1063

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1064

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1065

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1066

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1067

# A description of the user pipeline and stages through which it is executed.

1068

# Created by Cloud Dataflow service. Only retrieved with

1069

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1070

# form. This data is provided by the Dataflow service for ease of visualizing

1071

# the pipeline and interpreting Dataflow provided metrics.

1072

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1073

{ # Description of the composing transforms, names/ids, and input/outputs of a

1074

# stage of execution. Some composing transforms and sources may have been

1075

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1076

"outputSource": [ # Output sources for this stage.

1077

{ # Description of an input or output of an execution stage.

1078

"sizeBytes": "A String", # Size of the source, if measurable.

1079

"name": "A String", # Dataflow service generated name for this source.

1080

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1081

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1082

# source is most closely associated.

1083

},

1084

],

1085

"name": "A String", # Dataflow service generated name for this stage.

1086

"inputSource": [ # Input sources for this stage.

1087

{ # Description of an input or output of an execution stage.

1088

"sizeBytes": "A String", # Size of the source, if measurable.

1089

"name": "A String", # Dataflow service generated name for this source.

1090

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1091

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1092

# source is most closely associated.

1093

},

1094

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1095

"id": "A String", # Dataflow service generated id for this stage.

1096

"componentTransform": [ # Transforms that comprise this execution stage.

1097

{ # Description of a transform executed as part of an execution stage.

1098

"originalTransform": "A String", # User name for the original user transform with which this transform is

1099

# most closely associated.

1100

"name": "A String", # Dataflow service generated name for this source.

1101

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1102

},

1103

],

1104

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1105

{ # Description of an interstitial value between transforms in an execution

1106

# stage.

1107

"name": "A String", # Dataflow service generated name for this source.

1108

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1109

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1110

# source is most closely associated.

1111

},

1112

],

1113

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1114

},

1115

],

1116

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1117

{ # Description of the type, names/ids, and input/outputs for a transform.

1118

"kind": "A String", # Type of transform.

1119

"inputCollectionName": [ # User names for all collection inputs to this transform.

1120

"A String",

1121

],

1122

"name": "A String", # User provided name for this transform instance.

1123

"id": "A String", # SDK generated id of this transform instance.

1124

"displayData": [ # Transform-specific display data.

1125

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1126

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1127

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1128

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1129

# language namespace (i.e. python module) which defines the display data.

1130

# This allows a dax monitoring system to specially handle the data

1131

# and perform custom rendering.

1132

"floatValue": 3.14, # Contains value if the data is of float type.

1133

"key": "A String", # The key identifying the display data.

1134

# This is intended to be used as a label for the display data

1135

# when viewed in a dax monitoring system.

1136

"shortStrValue": "A String", # A possible additional shorter value to display.

1137

# For example a java_class_name_value of com.mypackage.MyDoFn

1138

# will be stored with MyDoFn as the short_str_value and

1139

# com.mypackage.MyDoFn as the java_class_name value.

1140

# short_str_value can be displayed and java_class_name_value

1141

# will be displayed as a tooltip.

1142

"url": "A String", # An optional full URL.

1143

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1144

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1145

"boolValue": True or False, # Contains value if the data is of a boolean type.

1146

"javaClassValue": "A String", # Contains value if the data is of java class type.

1147

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1148

},

1149

],

1150

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1156

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1157

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1158

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1159

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1160

# language namespace (i.e. python module) which defines the display data.

1161

# This allows a dax monitoring system to specially handle the data

1162

# and perform custom rendering.

1163

"floatValue": 3.14, # Contains value if the data is of float type.

1164

"key": "A String", # The key identifying the display data.

1165

# This is intended to be used as a label for the display data

1166

# when viewed in a dax monitoring system.

1167

"shortStrValue": "A String", # A possible additional shorter value to display.

1168

# For example a java_class_name_value of com.mypackage.MyDoFn

1169

# will be stored with MyDoFn as the short_str_value and

1170

# com.mypackage.MyDoFn as the java_class_name value.

1171

# short_str_value can be displayed and java_class_name_value

1172

# will be displayed as a tooltip.

1173

"url": "A String", # An optional full URL.

1174

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1175

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1176

"boolValue": True or False, # Contains value if the data is of a boolean type.

1177

"javaClassValue": "A String", # Contains value if the data is of java class type.

1178

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1183

# of the job it replaced.

1184

#

1185

# When sending a `CreateJobRequest`, you can update a job by specifying it

1186

# here. The job named here is stopped, and its intermediate state is

1187

# transferred to this job.

1188

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1189

# for temporary storage. These temporary files will be

1190

# removed on job completion.

1191

# No duplicates are allowed.

1192

# No file patterns are supported.

1193

#

1194

# The supported files are:

1195

#

1196

# Google Cloud Storage:

1197

#

1198

# storage.googleapis.com/{bucket}/{object}

1199

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1200

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1201

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1202

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1203

#

1204

# Only one Job with a given name may exist in a project at any

1205

# given time. If a caller attempts to create a Job with the same

1206

# name as an already-existing Job, the attempt returns the

1207

# existing Job.

1208

#

1209

# The name must match the regular expression

1210

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1211

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1212

#

1213

# The top-level steps that constitute the entire job.

1214

{ # Defines a particular step within a Cloud Dataflow job.

1215

#

1216

# A job consists of multiple steps, each of which performs some

1217

# specific operation as part of the overall job. Data is typically

1218

# passed from one step to another as part of the job.

1219

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1220

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1221

# Map-Reduce job:

1222

#

1223

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1224

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1225

#

1226

# * Validate the elements.

1227

#

1228

# * Apply a user-defined function to map each element to some value

1229

# and extract an element-specific key value.

1230

#

1231

# * Group elements with the same key into a single element with

1232

# that key, transforming a multiply-keyed collection into a

1233

# uniquely-keyed collection.

1234

#

1235

# * Write the elements out to some data sink.

1236

#

1237

# Note that the Cloud Dataflow service may be used to run many different

1238

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1239

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1240

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1241

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1242

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1243

# predefined step has its own required set of properties.

1244

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1245

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1246

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1247

},

1248

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1249

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1250

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1251

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1252

# isn't contained in the submitted job.

1253

"stages": { # A mapping from each stage to the information about that stage.

1254

"a_key": { # Contains information about how a particular

1255

# google.dataflow.v1beta3.Step will be executed.

1256

"stepName": [ # The steps associated with the execution stage.

1257

# Note that stages may have several steps, and that a given step

1258

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1265

#

1266

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1267

# specified.

1268

#

1269

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1270

# terminal state. After a job has reached a terminal state, no

1271

# further state updates may be made.

1272

#

1273

# This field may be mutated by the Cloud Dataflow service;

1274

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1275

"location": "A String", # The [regional endpoint]

1276

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1277

# contains this job.

1278

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1279

# Flexible resource scheduling jobs are started with some delay after job

1280

# creation, so start_time is unset before start and is updated when the

1281

# job is started by the Cloud Dataflow service. For other jobs, start_time

1282

# always equals to create_time and is immutable and set by the Cloud Dataflow

1283

# service.

1284

"stepsLocation": "A String", # The GCS location where the steps are stored.

1285

"labels": { # User-defined labels for this job.

1286

#

1287

# The labels map can contain no more than 64 entries. Entries of the labels

1288

# map are UTF8 strings that comply with the following restrictions:

1289

#

1290

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1291

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1292

# * Both keys and values are additionally constrained to be <= 128 bytes in

1293

# size.

1294

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1295

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1296

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1297

# Cloud Dataflow service.

1298

"requestedState": "A String", # The job's requested state.

1299

#

1300

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1301

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1302

# also be used to directly set a job's requested state to

1303

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1304

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1309

<code class="details" id="get">get(projectId, location, jobId, view=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1310

<pre>Gets the state of the specified Cloud Dataflow job.

1311

1312

To get the state of a job, we recommend using `projects.locations.jobs.get`

1313

with a [regional endpoint]

1314

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1315

`projects.jobs.get` is not recommended, as you can only get the state of

1316

jobs that are running in `us-central1`.

1317

1318

Args:

1319

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1320

location: string, The [regional endpoint]

1321

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1322

contains this job. (required)

1323

jobId: string, The job ID. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1324

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1325

x__xgafv: string, V1 error format.

1326

Allowed values

1327

1 - v1 error format

1328

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1329

1330

Returns:

1331

An object of the form:

1332

1333

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1334

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1335

# If this field is set, the service will ensure its uniqueness.

1336

# The request to create a job will fail if the service has knowledge of a

1337

# previously submitted job with the same client's ID and job name.

1338

# The caller may use this field to ensure idempotence of job

1339

# creation across retried attempts to create a job.

1340

# By default, the field is empty and, in that case, the service ignores it.

1341

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1342

#

1343

# This field is set by the Cloud Dataflow service when the Job is

1344

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1345

"currentStateTime": "A String", # The timestamp associated with the current state.

1346

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1347

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1348

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1349

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1350

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1351

"internalExperiments": { # Experimental settings.

1352

"a_key": "", # Properties of the object. Contains field @type with type URL.

1353

},

1354

"workerRegion": "A String", # The Compute Engine region

1355

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1356

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1357

# with worker_zone. If neither worker_region nor worker_zone is specified,

1358

# default to the control plane's region.

1359

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1360

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1361

#

1362

# Format:

1363

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1364

"userAgent": { # A description of the process that generated the request.

1365

"a_key": "", # Properties of the object.

1366

},

1367

"workerZone": "A String", # The Compute Engine zone

1368

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1369

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1370

# with worker_region. If neither worker_region nor worker_zone is specified,

1371

# a zone in the control plane's region is chosen based on available capacity.

1372

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1373

# unspecified, the service will attempt to choose a reasonable

1374

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1375

# e.g. "compute.googleapis.com".

1376

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1377

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1378

# this resource prefix, where {JOBNAME} is the value of the

1379

# job_name field. The resulting bucket and object prefix is used

1380

# as the prefix of the resources used to store temporary data

1381

# needed during the job execution. NOTE: This will override the

1382

# value in taskrunner_settings.

1383

# The supported resource type is:

1384

#

1385

# Google Cloud Storage:

1386

#

1387

# storage.googleapis.com/{bucket}/{object}

1388

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1389

"experiments": [ # The list of experiments to enable.

1390

"A String",

1391

],

1392

"version": { # A structure describing which components and their versions of the service

1393

# are required in order to run the job.

1394

"a_key": "", # Properties of the object.

1395

},

1396

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1397

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

1398

# options are passed through the service and are used to recreate the

1399

# SDK pipeline options on the worker in a language agnostic and platform

1400

# independent way.

1401

"a_key": "", # Properties of the object.

1402

},

1403

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1404

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

1405

# specified in order for the job to have workers.

1406

{ # Describes one particular pool of Cloud Dataflow workers to be

1407

# instantiated by the Cloud Dataflow service in order to perform the

1408

# computations required by a job. Note that a workflow job may use

1409

# multiple pools, in order to match the various computational

1410

# requirements of the various stages of the job.

1411

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

1412

# service will choose a number of threads (according to the number of cores

1413

# on the selected machine type for batch, or 1 by convention for streaming).

1414

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1415

# execute the job. If zero or unspecified, the service will

1416

# attempt to choose a reasonable default.

1417

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

1418

# will attempt to choose a reasonable default.

1419

"diskSourceImage": "A String", # Fully qualified source image for disks.

1420

"packages": [ # Packages to be installed on workers.

1421

{ # The packages that must be installed in order for a worker to run the

1422

# steps of the Cloud Dataflow job that will be assigned to its worker

1423

# pool.

1424

#

1425

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1426

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

1427

# might use this to install jars containing the user's code and all of the

1428

# various dependencies (libraries, data files, etc.) required in order

1429

# for that code to run.

1430

"name": "A String", # The name of the package.

1431

"location": "A String", # The resource to read the package from. The supported resource type is:

1432

#

1433

# Google Cloud Storage:

1434

#

1435

# storage.googleapis.com/{bucket}

1436

# bucket.storage.googleapis.com/

1437

},

1438

],

1439

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

1440

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1441

# `TEARDOWN_NEVER`.

1442

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1443

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1444

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1445

# down.

1446

#

1447

# If the workers are not torn down by the service, they will

1448

# continue to run and use Google Compute Engine VM resources in the

1449

# user's project until they are explicitly terminated by the user.

1450

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1451

# policy except for small, manually supervised test jobs.

1452

#

1453

# If unknown or unspecified, the service will attempt to choose a reasonable

1454

# default.

1455

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1456

# Compute Engine API.

1457

"poolArgs": { # Extra arguments for this worker pool.

1458

"a_key": "", # Properties of the object. Contains field @type with type URL.

1459

},

1460

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

1461

# attempt to choose a reasonable default.

1462

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1463

# harness, residing in Google Container Registry.

1464

#

1465

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1466

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

1467

# attempt to choose a reasonable default.

1468

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1469

# service will attempt to choose a reasonable default.

1470

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1471

# are supported.

1472

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

1473

# only be set in the Fn API path. For non-cross-language pipelines this

1474

# should have only one entry. Cross-language pipelines will have two or more

1475

# entries.

1476

{ # Defines a SDK harness container for executing Dataflow pipelines.

1477

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1478

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

1479

# container instance with this image. If false (or unset) recommends using

1480

# more than one core per SDK container instance with this image for

1481

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

1486

{ # Describes the data disk used by a workflow job.

1487

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

1488

# must be a disk type appropriate to the project and zone in which

1489

# the workers will run. If unknown or unspecified, the service

1490

# will attempt to choose a reasonable default.

1491

#

1492

# For example, the standard persistent disk type is a resource name

1493

# typically ending in "pd-standard". If SSD persistent disks are

1494

# available, the resource name typically ends with "pd-ssd". The

1495

# actual valid values are defined the Google Compute Engine API,

1496

# not by the Cloud Dataflow API; consult the Google Compute Engine

1497

# documentation for more information about determining the set of

1498

# available disk types for a particular project and zone.

1499

#

1500

# Google Compute Engine Disk types are local to a particular

1501

# project in a particular zone, and so the resource name will

1502

# typically look something like this:

1503

#

1504

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

1505

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

1506

# attempt to choose a reasonable default.

1507

"mountPoint": "A String", # Directory in a VM where disk is mounted.

1508

},

1509

],

1510

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1511

# the form "regions/REGION/subnetworks/SUBNETWORK".

1512

"ipConfiguration": "A String", # Configuration for VM IPs.

1513

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1514

# using the standard Dataflow task runner. Users should ignore

1515

# this field.

1516

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1517

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1518

# taskrunner; e.g. "wheel".

1519

"harnessCommand": "A String", # The command to launch the worker harness.

1520

"logDir": "A String", # The directory on the VM to store logs.

1521

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1522

# access the Cloud Dataflow API.

1523

"A String",

1524

],

1525

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1526

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1527

# will not be uploaded.

1528

#

1529

# The supported resource type is:

1530

#

1531

# Google Cloud Storage:

1532

# storage.googleapis.com/{bucket}/{object}

1533

# bucket.storage.googleapis.com/{object}

1534

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1535

"workflowFileName": "A String", # The file to store the workflow in.

1536

"languageHint": "A String", # The suggested backend language.

1537

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1538

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1539

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1540

# temporary storage.

1541

#

1542

# The supported resource type is:

1543

#

1544

# Google Cloud Storage:

1545

# storage.googleapis.com/{bucket}/{object}

1546

# bucket.storage.googleapis.com/{object}

1547

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1548

#

1549

# When workers access Google Cloud APIs, they logically do so via

1550

# relative URLs. If this field is specified, it supplies the base

1551

# URL to use for resolving these relative URLs. The normative

1552

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1553

# Locators".

1554

#

1555

# If not specified, the default value is "http://www.googleapis.com/"

1556

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1557

# console.

1558

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1559

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1560

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1561

# storage.

1562

#

1563

# The supported resource type is:

1564

#

1565

# Google Cloud Storage:

1566

#

1567

# storage.googleapis.com/{bucket}/{object}

1568

# bucket.storage.googleapis.com/{object}

1569

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1570

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1571

#

1572

# When workers access Google Cloud APIs, they logically do so via

1573

# relative URLs. If this field is specified, it supplies the base

1574

# URL to use for resolving these relative URLs. The normative

1575

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1576

# Locators".

1577

#

1578

# If not specified, the default value is "http://www.googleapis.com/"

1579

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1580

# "dataflow/v1b3/projects".

1581

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1582

# "shuffle/v1beta1".

1583

"workerId": "A String", # The ID of the worker running this pipeline.

1584

},

1585

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1586

# taskrunner; e.g. "root".

1587

"vmId": "A String", # The ID string of the VM.

1588

},

1589

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1590

"algorithm": "A String", # The algorithm to use for autoscaling.

1591

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1592

},

1593

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1594

"a_key": "A String",

1595

},

1596

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1597

# select a default set of packages which are useful to worker

1598

# harnesses written in a particular language.

1599

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1600

# the service will use the network "default".

1601

},

1602

],

1603

"dataset": "A String", # The dataset for the current project where various workflow

1604

# related tables are stored.

1605

#

1606

# The supported resource type is:

1607

#

1608

# Google BigQuery:

1609

# bigquery.googleapis.com/{dataset}

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1610

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1611

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1612

# callers cannot mutate it.

1613

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1614

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1615

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1616

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1617

},

1618

],

1619

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1620

# by the metadata values provided here. Populated for ListJobs and all GetJob

1621

# views SUMMARY and higher.

1622

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1623

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1624

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1625

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1626

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1627

},

1628

],

1629

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1630

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1631

"sdkSupportStatus": "A String", # The support status for this SDK version.

1632

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1633

},

1634

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1635

{ # Metadata for a BigQuery connector used by the job.

1636

"table": "A String", # Table accessed in the connection.

1637

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1638

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1639

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1640

},

1641

],

1642

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1643

{ # Metadata for a File connector used by the job.

1644

"filePattern": "A String", # File Pattern used to access files by the connector.

1645

},

1646

],

1647

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1648

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1649

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1650

"subscription": "A String", # Subscription used in the connection.

1651

},

1652

],

1653

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1654

{ # Metadata for a BigTable connector used by the job.

1655

"projectId": "A String", # ProjectId accessed in the connection.

1656

"instanceId": "A String", # InstanceId accessed in the connection.

1657

"tableId": "A String", # TableId accessed in the connection.

1658

},

1659

],

1660

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1661

{ # Metadata for a Spanner connector used by the job.

1662

"instanceId": "A String", # InstanceId accessed in the connection.

1663

"projectId": "A String", # ProjectId accessed in the connection.

1664

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1665

},

1666

],

1667

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1668

"type": "A String", # The type of Cloud Dataflow job.

1669

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1670

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1671

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1672

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1673

# A description of the user pipeline and stages through which it is executed.

1674

# Created by Cloud Dataflow service. Only retrieved with

1675

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1676

# form. This data is provided by the Dataflow service for ease of visualizing

1677

# the pipeline and interpreting Dataflow provided metrics.

1678

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1679

{ # Description of the composing transforms, names/ids, and input/outputs of a

1680

# stage of execution. Some composing transforms and sources may have been

1681

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1682

"outputSource": [ # Output sources for this stage.

1683

{ # Description of an input or output of an execution stage.

1684

"sizeBytes": "A String", # Size of the source, if measurable.

1685

"name": "A String", # Dataflow service generated name for this source.

1686

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1687

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1688

# source is most closely associated.

1689

},

1690

],

1691

"name": "A String", # Dataflow service generated name for this stage.

1692

"inputSource": [ # Input sources for this stage.

1693

{ # Description of an input or output of an execution stage.

1694

"sizeBytes": "A String", # Size of the source, if measurable.

1695

"name": "A String", # Dataflow service generated name for this source.

1696

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1697

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1698

# source is most closely associated.

1699

},

1700

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1701

"id": "A String", # Dataflow service generated id for this stage.

1702

"componentTransform": [ # Transforms that comprise this execution stage.

1703

{ # Description of a transform executed as part of an execution stage.

1704

"originalTransform": "A String", # User name for the original user transform with which this transform is

1705

# most closely associated.

1706

"name": "A String", # Dataflow service generated name for this source.

1707

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1708

},

1709

],

1710

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1711

{ # Description of an interstitial value between transforms in an execution

1712

# stage.

1713

"name": "A String", # Dataflow service generated name for this source.

1714

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1715

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1716

# source is most closely associated.

1717

},

1718

],

1719

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1720

},

1721

],

1722

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1723

{ # Description of the type, names/ids, and input/outputs for a transform.

1724

"kind": "A String", # Type of transform.

1725

"inputCollectionName": [ # User names for all collection inputs to this transform.

1726

"A String",

1727

],

1728

"name": "A String", # User provided name for this transform instance.

1729

"id": "A String", # SDK generated id of this transform instance.

1730

"displayData": [ # Transform-specific display data.

1731

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1732

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1733

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1734

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1735

# language namespace (i.e. python module) which defines the display data.

1736

# This allows a dax monitoring system to specially handle the data

1737

# and perform custom rendering.

1738

"floatValue": 3.14, # Contains value if the data is of float type.

1739

"key": "A String", # The key identifying the display data.

1740

# This is intended to be used as a label for the display data

1741

# when viewed in a dax monitoring system.

1742

"shortStrValue": "A String", # A possible additional shorter value to display.

1743

# For example a java_class_name_value of com.mypackage.MyDoFn

1744

# will be stored with MyDoFn as the short_str_value and

1745

# com.mypackage.MyDoFn as the java_class_name value.

1746

# short_str_value can be displayed and java_class_name_value

1747

# will be displayed as a tooltip.

1748

"url": "A String", # An optional full URL.

1749

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1750

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1751

"boolValue": True or False, # Contains value if the data is of a boolean type.

1752

"javaClassValue": "A String", # Contains value if the data is of java class type.

1753

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1754

},

1755

],

1756

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1762

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1763

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1764

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1765

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1766

# language namespace (i.e. python module) which defines the display data.

1767

# This allows a dax monitoring system to specially handle the data

1768

# and perform custom rendering.

1769

"floatValue": 3.14, # Contains value if the data is of float type.

1770

"key": "A String", # The key identifying the display data.

1771

# This is intended to be used as a label for the display data

1772

# when viewed in a dax monitoring system.

1773

"shortStrValue": "A String", # A possible additional shorter value to display.

1774

# For example a java_class_name_value of com.mypackage.MyDoFn

1775

# will be stored with MyDoFn as the short_str_value and

1776

# com.mypackage.MyDoFn as the java_class_name value.

1777

# short_str_value can be displayed and java_class_name_value

1778

# will be displayed as a tooltip.

1779

"url": "A String", # An optional full URL.

1780

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1781

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1782

"boolValue": True or False, # Contains value if the data is of a boolean type.

1783

"javaClassValue": "A String", # Contains value if the data is of java class type.

1784

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1789

# of the job it replaced.

1790

#

1791

# When sending a `CreateJobRequest`, you can update a job by specifying it

1792

# here. The job named here is stopped, and its intermediate state is

1793

# transferred to this job.

1794

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1795

# for temporary storage. These temporary files will be

1796

# removed on job completion.

1797

# No duplicates are allowed.

1798

# No file patterns are supported.

1799

#

1800

# The supported files are:

1801

#

1802

# Google Cloud Storage:

1803

#

1804

# storage.googleapis.com/{bucket}/{object}

1805

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1806

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1807

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1808

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1809

#

1810

# Only one Job with a given name may exist in a project at any

1811

# given time. If a caller attempts to create a Job with the same

1812

# name as an already-existing Job, the attempt returns the

1813

# existing Job.

1814

#

1815

# The name must match the regular expression

1816

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1817

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1818

#

1819

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1820

{ # Defines a particular step within a Cloud Dataflow job.

1821

#

1822

# A job consists of multiple steps, each of which performs some

1823

# specific operation as part of the overall job. Data is typically

1824

# passed from one step to another as part of the job.

1825

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1826

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1827

# Map-Reduce job:

1828

#

1829

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1830

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1831

#

1832

# * Validate the elements.

1833

#

1834

# * Apply a user-defined function to map each element to some value

1835

# and extract an element-specific key value.

1836

#

1837

# * Group elements with the same key into a single element with

1838

# that key, transforming a multiply-keyed collection into a

1839

# uniquely-keyed collection.

1840

#

1841

# * Write the elements out to some data sink.

1842

#

1843

# Note that the Cloud Dataflow service may be used to run many different

1844

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1845

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1846

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1847

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1848

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1849

# predefined step has its own required set of properties.

1850

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1851

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1852

},

1853

},

1854

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1855

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1856

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1857

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1858

# isn't contained in the submitted job.

1859

"stages": { # A mapping from each stage to the information about that stage.

1860

"a_key": { # Contains information about how a particular

1861

# google.dataflow.v1beta3.Step will be executed.

1862

"stepName": [ # The steps associated with the execution stage.

1863

# Note that stages may have several steps, and that a given step

1864

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1871

#

1872

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1873

# specified.

1874

#

1875

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1876

# terminal state. After a job has reached a terminal state, no

1877

# further state updates may be made.

1878

#

1879

# This field may be mutated by the Cloud Dataflow service;

1880

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1881

"location": "A String", # The [regional endpoint]

1882

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1883

# contains this job.

1884

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1885

# Flexible resource scheduling jobs are started with some delay after job

1886

# creation, so start_time is unset before start and is updated when the

1887

# job is started by the Cloud Dataflow service. For other jobs, start_time

1888

# always equals to create_time and is immutable and set by the Cloud Dataflow

1889

# service.

1890

"stepsLocation": "A String", # The GCS location where the steps are stored.

1891

"labels": { # User-defined labels for this job.

1892

#

1893

# The labels map can contain no more than 64 entries. Entries of the labels

1894

# map are UTF8 strings that comply with the following restrictions:

1895

#

1896

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1897

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1898

# * Both keys and values are additionally constrained to be <= 128 bytes in

1899

# size.

1900

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1901

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1902

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1903

# Cloud Dataflow service.

1904

"requestedState": "A String", # The job's requested state.

1905

#

1906

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1907

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1908

# also be used to directly set a job's requested state to

1909

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1910

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1911

}</pre>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

</div>

<code class="details" id="getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</code>

1916

<pre>Request the job status.

1917

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1918

To request the status of a job, we recommend using

1919

`projects.locations.jobs.getMetrics` with a [regional endpoint]

1920

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1921

`projects.jobs.getMetrics` is not recommended, as you can only request the

1922

status of jobs that are running in `us-central1`.

1923

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1924

Args:

1925

projectId: string, A project id. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1926

location: string, The [regional endpoint]

1927

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1928

contains the job specified by job_id. (required)

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1929

jobId: string, The job to get messages for. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1930

startTime: string, Return only metric data that has changed since this time.

1931

Default is to return all information about all metrics for the job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1932

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1933

Allowed values

1934

1 - v1 error format

1935

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1936

1937

Returns:

1938

An object of the form:

1939

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1940

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1941

# of a Dataflow job. Metrics correspond to user-defined and system-defined

1942

# metrics in the job.

1943

#

1944

# This resource captures only the most recent values of each metric;

1945

# time-series data can be queried for them (under the same metric names)

1946

# from Cloud Monitoring.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1947

"metrics": [ # All metrics for this job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1948

{ # Describes the state of a metric.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1949

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

1950

# possible value type is a list of Values whose type can be Long, Double,

1951

# or String, according to the metric's type. All Values in the list must

1952

# be of the same type.

1953

"gauge": "", # A struct value describing properties of a Gauge.

1954

# Metrics of gauge type show the value of a metric across time, and is

1955

# aggregated based on the newest value.

1956

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

1957

# value accumulated since the worker started working on this WorkItem.

1958

# By default this is false, indicating that this metric is reported

1959

# as a delta that is not associated with any WorkItem.

1960

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

1961

# service.

1962

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

1963

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1964

# The specified aggregation kind is case-insensitive.

1965

#

1966

# If omitted, this is not an aggregated value but instead

1967

# a single metric sample value.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1968

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

1969

# "And", and "Or". The possible value types are Long, Double, and Boolean.

1970

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

1971

# This holds the count of the aggregated values and is used in combination

1972

# with mean_sum above to obtain the actual mean aggregate value.

1973

# The only possible value type is Long.

1974

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

Sai Cheemalapati

4ba8c23

2017-06-06 18:46:08 -0400

[diff] [blame]

1975

# This holds the sum of the aggregated values and is used in combination

1976

# with mean_count below to obtain the actual mean aggregate value.

1977

# The only possible value types are Long and Double.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1978

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1979

# reporting work progress; it will be filled in responses from the

1980

# metrics API.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1981

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

1982

# metric.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1983

"name": "A String", # Worker-defined metric name.

1984

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

1985

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1986

"context": { # Zero or more labeled fields which identify the part of the job this

1987

# metric is associated with, such as the name of a step or collection.

1988

#

1989

# For example, built-in counters associated with steps will have

1990

# context['step'] = <step-name>. Counters associated with PCollections

1991

# in the SDK will have context['pcollection'] = <pcollection-name>.

1992

"a_key": "A String",

1993

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1994

},

1995

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1996

},

1997

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

1998

"metricTime": "A String", # Timestamp as of which metric values are current.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2003

<code class="details" id="list">list(projectId, location, filter=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2004

<pre>List the jobs of a project.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2005

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2006

To list the jobs of a project in a region, we recommend using

2007

`projects.locations.jobs.get` with a [regional endpoint]

2008

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2009

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2010

`projects.jobs.list` is not recommended, as you can only get the list of

2011

jobs that are running in `us-central1`.

2012

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2013

Args:

2014

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2015

location: string, The [regional endpoint]

2016

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2017

contains this job. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2018

filter: string, The kind of filter to use.

2019

pageToken: string, Set this to the 'next_page_token' field of a previous response

2020

to request additional results in a long list.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2021

pageSize: integer, If there are many jobs, limit response to at most this many.

2022

The actual number of jobs returned will be the lesser of max_responses

2023

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2024

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2025

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2026

Allowed values

2027

1 - v1 error format

2028

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2029

2030

Returns:

2031

An object of the form:

2032

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2033

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2034

# be a partial response, depending on the page size in the ListJobsRequest.

2035

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2036

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2037

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2038

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2039

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2040

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2041

# If this field is set, the service will ensure its uniqueness.

2042

# The request to create a job will fail if the service has knowledge of a

2043

# previously submitted job with the same client's ID and job name.

2044

# The caller may use this field to ensure idempotence of job

2045

# creation across retried attempts to create a job.

2046

# By default, the field is empty and, in that case, the service ignores it.

2047

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2048

#

2049

# This field is set by the Cloud Dataflow service when the Job is

2050

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2051

"currentStateTime": "A String", # The timestamp associated with the current state.

2052

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2053

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2054

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2055

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2056

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2057

"internalExperiments": { # Experimental settings.

2058

"a_key": "", # Properties of the object. Contains field @type with type URL.

2059

},

2060

"workerRegion": "A String", # The Compute Engine region

2061

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2062

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2063

# with worker_zone. If neither worker_region nor worker_zone is specified,

2064

# default to the control plane's region.

2065

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2066

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2067

#

2068

# Format:

2069

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2070

"userAgent": { # A description of the process that generated the request.

2071

"a_key": "", # Properties of the object.

2072

},

2073

"workerZone": "A String", # The Compute Engine zone

2074

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2075

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2076

# with worker_region. If neither worker_region nor worker_zone is specified,

2077

# a zone in the control plane's region is chosen based on available capacity.

2078

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2079

# unspecified, the service will attempt to choose a reasonable

2080

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2081

# e.g. "compute.googleapis.com".

2082

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2083

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2084

# this resource prefix, where {JOBNAME} is the value of the

2085

# job_name field. The resulting bucket and object prefix is used

2086

# as the prefix of the resources used to store temporary data

2087

# needed during the job execution. NOTE: This will override the

2088

# value in taskrunner_settings.

2089

# The supported resource type is:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2090

#

2091

# Google Cloud Storage:

2092

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2093

# storage.googleapis.com/{bucket}/{object}

2094

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2095

"experiments": [ # The list of experiments to enable.

2096

"A String",

2097

],

2098

"version": { # A structure describing which components and their versions of the service

2099

# are required in order to run the job.

2100

"a_key": "", # Properties of the object.

2101

},

2102

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2103

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2104

# options are passed through the service and are used to recreate the

2105

# SDK pipeline options on the worker in a language agnostic and platform

2106

# independent way.

2107

"a_key": "", # Properties of the object.

2108

},

2109

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2110

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2111

# specified in order for the job to have workers.

2112

{ # Describes one particular pool of Cloud Dataflow workers to be

2113

# instantiated by the Cloud Dataflow service in order to perform the

2114

# computations required by a job. Note that a workflow job may use

2115

# multiple pools, in order to match the various computational

2116

# requirements of the various stages of the job.

2117

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2118

# service will choose a number of threads (according to the number of cores

2119

# on the selected machine type for batch, or 1 by convention for streaming).

2120

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2121

# execute the job. If zero or unspecified, the service will

2122

# attempt to choose a reasonable default.

2123

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2124

# will attempt to choose a reasonable default.

2125

"diskSourceImage": "A String", # Fully qualified source image for disks.

2126

"packages": [ # Packages to be installed on workers.

2127

{ # The packages that must be installed in order for a worker to run the

2128

# steps of the Cloud Dataflow job that will be assigned to its worker

2129

# pool.

2130

#

2131

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2132

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2133

# might use this to install jars containing the user's code and all of the

2134

# various dependencies (libraries, data files, etc.) required in order

2135

# for that code to run.

2136

"name": "A String", # The name of the package.

2137

"location": "A String", # The resource to read the package from. The supported resource type is:

2138

#

2139

# Google Cloud Storage:

2140

#

2141

# storage.googleapis.com/{bucket}

2142

# bucket.storage.googleapis.com/

2143

},

2144

],

2145

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2146

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2147

# `TEARDOWN_NEVER`.

2148

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2149

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2150

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2151

# down.

2152

#

2153

# If the workers are not torn down by the service, they will

2154

# continue to run and use Google Compute Engine VM resources in the

2155

# user's project until they are explicitly terminated by the user.

2156

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2157

# policy except for small, manually supervised test jobs.

2158

#

2159

# If unknown or unspecified, the service will attempt to choose a reasonable

2160

# default.

2161

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2162

# Compute Engine API.

2163

"poolArgs": { # Extra arguments for this worker pool.

2164

"a_key": "", # Properties of the object. Contains field @type with type URL.

2165

},

2166

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2167

# attempt to choose a reasonable default.

2168

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2169

# harness, residing in Google Container Registry.

2170

#

2171

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2172

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

2173

# attempt to choose a reasonable default.

2174

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2175

# service will attempt to choose a reasonable default.

2176

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2177

# are supported.

2178

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2179

# only be set in the Fn API path. For non-cross-language pipelines this

2180

# should have only one entry. Cross-language pipelines will have two or more

2181

# entries.

2182

{ # Defines a SDK harness container for executing Dataflow pipelines.

2183

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2184

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2185

# container instance with this image. If false (or unset) recommends using

2186

# more than one core per SDK container instance with this image for

2187

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2192

{ # Describes the data disk used by a workflow job.

2193

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2194

# must be a disk type appropriate to the project and zone in which

2195

# the workers will run. If unknown or unspecified, the service

2196

# will attempt to choose a reasonable default.

2197

#

2198

# For example, the standard persistent disk type is a resource name

2199

# typically ending in "pd-standard". If SSD persistent disks are

2200

# available, the resource name typically ends with "pd-ssd". The

2201

# actual valid values are defined the Google Compute Engine API,

2202

# not by the Cloud Dataflow API; consult the Google Compute Engine

2203

# documentation for more information about determining the set of

2204

# available disk types for a particular project and zone.

2205

#

2206

# Google Compute Engine Disk types are local to a particular

2207

# project in a particular zone, and so the resource name will

2208

# typically look something like this:

2209

#

2210

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2211

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2212

# attempt to choose a reasonable default.

2213

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2214

},

2215

],

2216

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2217

# the form "regions/REGION/subnetworks/SUBNETWORK".

2218

"ipConfiguration": "A String", # Configuration for VM IPs.

2219

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2220

# using the standard Dataflow task runner. Users should ignore

2221

# this field.

2222

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2223

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2224

# taskrunner; e.g. "wheel".

2225

"harnessCommand": "A String", # The command to launch the worker harness.

2226

"logDir": "A String", # The directory on the VM to store logs.

2227

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2228

# access the Cloud Dataflow API.

2229

"A String",

2230

],

2231

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2232

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2233

# will not be uploaded.

2234

#

2235

# The supported resource type is:

2236

#

2237

# Google Cloud Storage:

2238

# storage.googleapis.com/{bucket}/{object}

2239

# bucket.storage.googleapis.com/{object}

2240

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2241

"workflowFileName": "A String", # The file to store the workflow in.

2242

"languageHint": "A String", # The suggested backend language.

2243

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2244

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2245

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2246

# temporary storage.

2247

#

2248

# The supported resource type is:

2249

#

2250

# Google Cloud Storage:

2251

# storage.googleapis.com/{bucket}/{object}

2252

# bucket.storage.googleapis.com/{object}

2253

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2254

#

2255

# When workers access Google Cloud APIs, they logically do so via

2256

# relative URLs. If this field is specified, it supplies the base

2257

# URL to use for resolving these relative URLs. The normative

2258

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2259

# Locators".

2260

#

2261

# If not specified, the default value is "http://www.googleapis.com/"

2262

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2263

# console.

2264

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2265

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2266

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2267

# storage.

2268

#

2269

# The supported resource type is:

2270

#

2271

# Google Cloud Storage:

2272

#

2273

# storage.googleapis.com/{bucket}/{object}

2274

# bucket.storage.googleapis.com/{object}

2275

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2276

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2277

#

2278

# When workers access Google Cloud APIs, they logically do so via

2279

# relative URLs. If this field is specified, it supplies the base

2280

# URL to use for resolving these relative URLs. The normative

2281

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2282

# Locators".

2283

#

2284

# If not specified, the default value is "http://www.googleapis.com/"

2285

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2286

# "dataflow/v1b3/projects".

2287

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2288

# "shuffle/v1beta1".

2289

"workerId": "A String", # The ID of the worker running this pipeline.

2290

},

2291

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2292

# taskrunner; e.g. "root".

2293

"vmId": "A String", # The ID string of the VM.

2294

},

2295

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2296

"algorithm": "A String", # The algorithm to use for autoscaling.

2297

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2298

},

2299

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2300

"a_key": "A String",

2301

},

2302

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2303

# select a default set of packages which are useful to worker

2304

# harnesses written in a particular language.

2305

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2306

# the service will use the network "default".

2307

},

2308

],

2309

"dataset": "A String", # The dataset for the current project where various workflow

2310

# related tables are stored.

2311

#

2312

# The supported resource type is:

2313

#

2314

# Google BigQuery:

2315

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2316

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2317

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2318

# callers cannot mutate it.

2319

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2320

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2321

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2322

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2323

},

2324

],

2325

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2326

# by the metadata values provided here. Populated for ListJobs and all GetJob

2327

# views SUMMARY and higher.

2328

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2329

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2330

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2331

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2332

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2333

},

2334

],

2335

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2336

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2337

"sdkSupportStatus": "A String", # The support status for this SDK version.

2338

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2339

},

2340

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2341

{ # Metadata for a BigQuery connector used by the job.

2342

"table": "A String", # Table accessed in the connection.

2343

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2344

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2345

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2346

},

2347

],

2348

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2349

{ # Metadata for a File connector used by the job.

2350

"filePattern": "A String", # File Pattern used to access files by the connector.

2351

},

2352

],

2353

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2354

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2355

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2356

"subscription": "A String", # Subscription used in the connection.

2357

},

2358

],

2359

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2360

{ # Metadata for a BigTable connector used by the job.

2361

"projectId": "A String", # ProjectId accessed in the connection.

2362

"instanceId": "A String", # InstanceId accessed in the connection.

2363

"tableId": "A String", # TableId accessed in the connection.

2364

},

2365

],

2366

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2367

{ # Metadata for a Spanner connector used by the job.

2368

"instanceId": "A String", # InstanceId accessed in the connection.

2369

"projectId": "A String", # ProjectId accessed in the connection.

2370

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2371

},

2372

],

2373

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2374

"type": "A String", # The type of Cloud Dataflow job.

2375

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2376

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2377

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2378

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2379

# A description of the user pipeline and stages through which it is executed.

2380

# Created by Cloud Dataflow service. Only retrieved with

2381

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2382

# form. This data is provided by the Dataflow service for ease of visualizing

2383

# the pipeline and interpreting Dataflow provided metrics.

2384

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2385

{ # Description of the composing transforms, names/ids, and input/outputs of a

2386

# stage of execution. Some composing transforms and sources may have been

2387

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2388

"outputSource": [ # Output sources for this stage.

2389

{ # Description of an input or output of an execution stage.

2390

"sizeBytes": "A String", # Size of the source, if measurable.

2391

"name": "A String", # Dataflow service generated name for this source.

2392

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2393

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2394

# source is most closely associated.

2395

},

2396

],

2397

"name": "A String", # Dataflow service generated name for this stage.

2398

"inputSource": [ # Input sources for this stage.

2399

{ # Description of an input or output of an execution stage.

2400

"sizeBytes": "A String", # Size of the source, if measurable.

2401

"name": "A String", # Dataflow service generated name for this source.

2402

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2403

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2404

# source is most closely associated.

2405

},

2406

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2407

"id": "A String", # Dataflow service generated id for this stage.

2408

"componentTransform": [ # Transforms that comprise this execution stage.

2409

{ # Description of a transform executed as part of an execution stage.

2410

"originalTransform": "A String", # User name for the original user transform with which this transform is

2411

# most closely associated.

2412

"name": "A String", # Dataflow service generated name for this source.

2413

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2414

},

2415

],

2416

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2417

{ # Description of an interstitial value between transforms in an execution

2418

# stage.

2419

"name": "A String", # Dataflow service generated name for this source.

2420

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2421

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2422

# source is most closely associated.

2423

},

2424

],

2425

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2426

},

2427

],

2428

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2429

{ # Description of the type, names/ids, and input/outputs for a transform.

2430

"kind": "A String", # Type of transform.

2431

"inputCollectionName": [ # User names for all collection inputs to this transform.

2432

"A String",

2433

],

2434

"name": "A String", # User provided name for this transform instance.

2435

"id": "A String", # SDK generated id of this transform instance.

2436

"displayData": [ # Transform-specific display data.

2437

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2438

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2439

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2440

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2441

# language namespace (i.e. python module) which defines the display data.

2442

# This allows a dax monitoring system to specially handle the data

2443

# and perform custom rendering.

2444

"floatValue": 3.14, # Contains value if the data is of float type.

2445

"key": "A String", # The key identifying the display data.

2446

# This is intended to be used as a label for the display data

2447

# when viewed in a dax monitoring system.

2448

"shortStrValue": "A String", # A possible additional shorter value to display.

2449

# For example a java_class_name_value of com.mypackage.MyDoFn

2450

# will be stored with MyDoFn as the short_str_value and

2451

# com.mypackage.MyDoFn as the java_class_name value.

2452

# short_str_value can be displayed and java_class_name_value

2453

# will be displayed as a tooltip.

2454

"url": "A String", # An optional full URL.

2455

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2456

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2457

"boolValue": True or False, # Contains value if the data is of a boolean type.

2458

"javaClassValue": "A String", # Contains value if the data is of java class type.

2459

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2460

},

2461

],

2462

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

2468

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2469

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2470

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2471

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2472

# language namespace (i.e. python module) which defines the display data.

2473

# This allows a dax monitoring system to specially handle the data

2474

# and perform custom rendering.

2475

"floatValue": 3.14, # Contains value if the data is of float type.

2476

"key": "A String", # The key identifying the display data.

2477

# This is intended to be used as a label for the display data

2478

# when viewed in a dax monitoring system.

2479

"shortStrValue": "A String", # A possible additional shorter value to display.

2480

# For example a java_class_name_value of com.mypackage.MyDoFn

2481

# will be stored with MyDoFn as the short_str_value and

2482

# com.mypackage.MyDoFn as the java_class_name value.

2483

# short_str_value can be displayed and java_class_name_value

2484

# will be displayed as a tooltip.

2485

"url": "A String", # An optional full URL.

2486

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2487

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2488

"boolValue": True or False, # Contains value if the data is of a boolean type.

2489

"javaClassValue": "A String", # Contains value if the data is of java class type.

2490

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2495

# of the job it replaced.

2496

#

2497

# When sending a `CreateJobRequest`, you can update a job by specifying it

2498

# here. The job named here is stopped, and its intermediate state is

2499

# transferred to this job.

2500

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2501

# for temporary storage. These temporary files will be

2502

# removed on job completion.

2503

# No duplicates are allowed.

2504

# No file patterns are supported.

2505

#

2506

# The supported files are:

2507

#

2508

# Google Cloud Storage:

2509

#

2510

# storage.googleapis.com/{bucket}/{object}

2511

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2512

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2513

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2514

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2515

#

2516

# Only one Job with a given name may exist in a project at any

2517

# given time. If a caller attempts to create a Job with the same

2518

# name as an already-existing Job, the attempt returns the

2519

# existing Job.

2520

#

2521

# The name must match the regular expression

2522

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2523

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2524

#

2525

# The top-level steps that constitute the entire job.

2526

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2527

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2528

# A job consists of multiple steps, each of which performs some

2529

# specific operation as part of the overall job. Data is typically

2530

# passed from one step to another as part of the job.

2531

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2532

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2533

# Map-Reduce job:

2534

#

2535

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2536

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2537

#

2538

# * Validate the elements.

2539

#

2540

# * Apply a user-defined function to map each element to some value

2541

# and extract an element-specific key value.

2542

#

2543

# * Group elements with the same key into a single element with

2544

# that key, transforming a multiply-keyed collection into a

2545

# uniquely-keyed collection.

2546

#

2547

# * Write the elements out to some data sink.

2548

#

2549

# Note that the Cloud Dataflow service may be used to run many different

2550

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2551

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2552

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2553

"kind": "A String", # The kind of step in the Cloud Dataflow job.

2554

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2555

# predefined step has its own required set of properties.

2556

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2557

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2558

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2559

},

2560

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2561

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2562

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2563

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2564

# isn't contained in the submitted job.

2565

"stages": { # A mapping from each stage to the information about that stage.

2566

"a_key": { # Contains information about how a particular

2567

# google.dataflow.v1beta3.Step will be executed.

2568

"stepName": [ # The steps associated with the execution stage.

2569

# Note that stages may have several steps, and that a given step

2570

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2577

#

2578

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2579

# specified.

2580

#

2581

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2582

# terminal state. After a job has reached a terminal state, no

2583

# further state updates may be made.

2584

#

2585

# This field may be mutated by the Cloud Dataflow service;

2586

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2587

"location": "A String", # The [regional endpoint]

2588

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2589

# contains this job.

2590

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2591

# Flexible resource scheduling jobs are started with some delay after job

2592

# creation, so start_time is unset before start and is updated when the

2593

# job is started by the Cloud Dataflow service. For other jobs, start_time

2594

# always equals to create_time and is immutable and set by the Cloud Dataflow

2595

# service.

2596

"stepsLocation": "A String", # The GCS location where the steps are stored.

2597

"labels": { # User-defined labels for this job.

2598

#

2599

# The labels map can contain no more than 64 entries. Entries of the labels

2600

# map are UTF8 strings that comply with the following restrictions:

2601

#

2602

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2603

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2604

# * Both keys and values are additionally constrained to be <= 128 bytes in

2605

# size.

2606

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2607

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2608

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2609

# Cloud Dataflow service.

2610

"requestedState": "A String", # The job's requested state.

2611

#

2612

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2613

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2614

# also be used to directly set a job's requested state to

2615

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2616

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2617

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2618

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2619

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

2620

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2621

# failed to respond.

2622

{ # Indicates which [regional endpoint]

2623

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

2624

# to respond to a request for data.

2625

"name": "A String", # The name of the [regional endpoint]

2626

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

# failed to respond.

},

],

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

2636

<pre>Retrieves the next page of results.

2637

2638

Args:

2639

previous_request: The request for the previous page. (required)

2640

previous_response: The response from the request for the previous page. (required)

2641

2642

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2643

A request object that you can call 'execute()' on to request the next

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2644

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2649

<code class="details" id="snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2650

<pre>Snapshot the state of a streaming job.

2651

2652

Args:

2653

projectId: string, The project which owns the job to be snapshotted. (required)

2654

location: string, The location that contains this job. (required)

2655

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2656

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2657

The object takes the form of:

2658

2659

{ # Request to create a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2660

"description": "A String", # User specified description of the snapshot. Maybe empty.

2661

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

2662

"ttl": "A String", # TTL for the snapshot.

2663

"location": "A String", # The location that contains this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2664

}

2665

2666

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2673

2674

{ # Represents a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2675

"pubsubMetadata": [ # PubSub snapshot metadata.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2676

{ # Represents a Pubsub snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2677

"snapshotName": "A String", # The name of the Pubsub snapshot.

2678

"topicName": "A String", # The name of the Pubsub topic.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2679

"expireTime": "A String", # The expire time of the Pubsub snapshot.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2680

},

2681

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2682

"creationTime": "A String", # The time this snapshot was created.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2683

"sourceJobId": "A String", # The job this snapshot was created from.

2684

"state": "A String", # State of the snapshot.

2685

"projectId": "A String", # The project this snapshot belongs to.

2686

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

2687

"id": "A String", # The unique ID of this snapshot.

2688

"description": "A String", # User specified description of the snapshot. Maybe empty.

2689

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

2690

# state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2695

<code class="details" id="update">update(projectId, location, jobId, body=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2696

<pre>Updates the state of an existing Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2697

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2698

To update the state of an existing job, we recommend using

2699

`projects.locations.jobs.update` with a [regional endpoint]

2700

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2701

`projects.jobs.update` is not recommended, as you can only update the state

2702

of jobs that are running in `us-central1`.

2703

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2704

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2705

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2706

location: string, The [regional endpoint]

2707

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2708

contains this job. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2709

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2710

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2711

The object takes the form of:

2712

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2713

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2714

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2715

# If this field is set, the service will ensure its uniqueness.

2716

# The request to create a job will fail if the service has knowledge of a

2717

# previously submitted job with the same client's ID and job name.

2718

# The caller may use this field to ensure idempotence of job

2719

# creation across retried attempts to create a job.

2720

# By default, the field is empty and, in that case, the service ignores it.

2721

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2722

#

2723

# This field is set by the Cloud Dataflow service when the Job is

2724

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2725

"currentStateTime": "A String", # The timestamp associated with the current state.

2726

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2727

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2728

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2729

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2730

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2731

"internalExperiments": { # Experimental settings.

2732

"a_key": "", # Properties of the object. Contains field @type with type URL.

2733

},

2734

"workerRegion": "A String", # The Compute Engine region

2735

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2736

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2737

# with worker_zone. If neither worker_region nor worker_zone is specified,

2738

# default to the control plane's region.

2739

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2740

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2741

#

2742

# Format:

2743

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2744

"userAgent": { # A description of the process that generated the request.

2745

"a_key": "", # Properties of the object.

2746

},

2747

"workerZone": "A String", # The Compute Engine zone

2748

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2749

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2750

# with worker_region. If neither worker_region nor worker_zone is specified,

2751

# a zone in the control plane's region is chosen based on available capacity.

2752

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2753

# unspecified, the service will attempt to choose a reasonable

2754

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2755

# e.g. "compute.googleapis.com".

2756

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2757

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2758

# this resource prefix, where {JOBNAME} is the value of the

2759

# job_name field. The resulting bucket and object prefix is used

2760

# as the prefix of the resources used to store temporary data

2761

# needed during the job execution. NOTE: This will override the

2762

# value in taskrunner_settings.

2763

# The supported resource type is:

2764

#

2765

# Google Cloud Storage:

2766

#

2767

# storage.googleapis.com/{bucket}/{object}

2768

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2769

"experiments": [ # The list of experiments to enable.

2770

"A String",

2771

],

2772

"version": { # A structure describing which components and their versions of the service

2773

# are required in order to run the job.

2774

"a_key": "", # Properties of the object.

2775

},

2776

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2777

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

2778

# options are passed through the service and are used to recreate the

2779

# SDK pipeline options on the worker in a language agnostic and platform

2780

# independent way.

2781

"a_key": "", # Properties of the object.

2782

},

2783

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2784

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

2785

# specified in order for the job to have workers.

2786

{ # Describes one particular pool of Cloud Dataflow workers to be

2787

# instantiated by the Cloud Dataflow service in order to perform the

2788

# computations required by a job. Note that a workflow job may use

2789

# multiple pools, in order to match the various computational

2790

# requirements of the various stages of the job.

2791

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

2792

# service will choose a number of threads (according to the number of cores

2793

# on the selected machine type for batch, or 1 by convention for streaming).

2794

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2795

# execute the job. If zero or unspecified, the service will

2796

# attempt to choose a reasonable default.

2797

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

2798

# will attempt to choose a reasonable default.

2799

"diskSourceImage": "A String", # Fully qualified source image for disks.

2800

"packages": [ # Packages to be installed on workers.

2801

{ # The packages that must be installed in order for a worker to run the

2802

# steps of the Cloud Dataflow job that will be assigned to its worker

2803

# pool.

2804

#

2805

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2806

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

2807

# might use this to install jars containing the user's code and all of the

2808

# various dependencies (libraries, data files, etc.) required in order

2809

# for that code to run.

2810

"name": "A String", # The name of the package.

2811

"location": "A String", # The resource to read the package from. The supported resource type is:

2812

#

2813

# Google Cloud Storage:

2814

#

2815

# storage.googleapis.com/{bucket}

2816

# bucket.storage.googleapis.com/

2817

},

2818

],

2819

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

2820

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2821

# `TEARDOWN_NEVER`.

2822

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2823

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2824

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2825

# down.

2826

#

2827

# If the workers are not torn down by the service, they will

2828

# continue to run and use Google Compute Engine VM resources in the

2829

# user's project until they are explicitly terminated by the user.

2830

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2831

# policy except for small, manually supervised test jobs.

2832

#

2833

# If unknown or unspecified, the service will attempt to choose a reasonable

2834

# default.

2835

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2836

# Compute Engine API.

2837

"poolArgs": { # Extra arguments for this worker pool.

2838

"a_key": "", # Properties of the object. Contains field @type with type URL.

2839

},

2840

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

2841

# attempt to choose a reasonable default.

2842

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2843

# harness, residing in Google Container Registry.

2844

#

2845

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2846

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

2847

# attempt to choose a reasonable default.

2848

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2849

# service will attempt to choose a reasonable default.

2850

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2851

# are supported.

2852

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

2853

# only be set in the Fn API path. For non-cross-language pipelines this

2854

# should have only one entry. Cross-language pipelines will have two or more

2855

# entries.

2856

{ # Defines a SDK harness container for executing Dataflow pipelines.

2857

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2858

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

2859

# container instance with this image. If false (or unset) recommends using

2860

# more than one core per SDK container instance with this image for

2861

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

2866

{ # Describes the data disk used by a workflow job.

2867

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

2868

# must be a disk type appropriate to the project and zone in which

2869

# the workers will run. If unknown or unspecified, the service

2870

# will attempt to choose a reasonable default.

2871

#

2872

# For example, the standard persistent disk type is a resource name

2873

# typically ending in "pd-standard". If SSD persistent disks are

2874

# available, the resource name typically ends with "pd-ssd". The

2875

# actual valid values are defined the Google Compute Engine API,

2876

# not by the Cloud Dataflow API; consult the Google Compute Engine

2877

# documentation for more information about determining the set of

2878

# available disk types for a particular project and zone.

2879

#

2880

# Google Compute Engine Disk types are local to a particular

2881

# project in a particular zone, and so the resource name will

2882

# typically look something like this:

2883

#

2884

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

2885

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

2886

# attempt to choose a reasonable default.

2887

"mountPoint": "A String", # Directory in a VM where disk is mounted.

2888

},

2889

],

2890

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2891

# the form "regions/REGION/subnetworks/SUBNETWORK".

2892

"ipConfiguration": "A String", # Configuration for VM IPs.

2893

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2894

# using the standard Dataflow task runner. Users should ignore

2895

# this field.

2896

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2897

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2898

# taskrunner; e.g. "wheel".

2899

"harnessCommand": "A String", # The command to launch the worker harness.

2900

"logDir": "A String", # The directory on the VM to store logs.

2901

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2902

# access the Cloud Dataflow API.

2903

"A String",

2904

],

2905

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2906

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2907

# will not be uploaded.

2908

#

2909

# The supported resource type is:

2910

#

2911

# Google Cloud Storage:

2912

# storage.googleapis.com/{bucket}/{object}

2913

# bucket.storage.googleapis.com/{object}

2914

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2915

"workflowFileName": "A String", # The file to store the workflow in.

2916

"languageHint": "A String", # The suggested backend language.

2917

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2918

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2919

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2920

# temporary storage.

2921

#

2922

# The supported resource type is:

2923

#

2924

# Google Cloud Storage:

2925

# storage.googleapis.com/{bucket}/{object}

2926

# bucket.storage.googleapis.com/{object}

2927

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2928

#

2929

# When workers access Google Cloud APIs, they logically do so via

2930

# relative URLs. If this field is specified, it supplies the base

2931

# URL to use for resolving these relative URLs. The normative

2932

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2933

# Locators".

2934

#

2935

# If not specified, the default value is "http://www.googleapis.com/"

2936

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2937

# console.

2938

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2939

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2940

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2941

# storage.

2942

#

2943

# The supported resource type is:

2944

#

2945

# Google Cloud Storage:

2946

#

2947

# storage.googleapis.com/{bucket}/{object}

2948

# bucket.storage.googleapis.com/{object}

2949

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2950

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2951

#

2952

# When workers access Google Cloud APIs, they logically do so via

2953

# relative URLs. If this field is specified, it supplies the base

2954

# URL to use for resolving these relative URLs. The normative

2955

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2956

# Locators".

2957

#

2958

# If not specified, the default value is "http://www.googleapis.com/"

2959

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2960

# "dataflow/v1b3/projects".

2961

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2962

# "shuffle/v1beta1".

2963

"workerId": "A String", # The ID of the worker running this pipeline.

2964

},

2965

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2966

# taskrunner; e.g. "root".

2967

"vmId": "A String", # The ID string of the VM.

2968

},

2969

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2970

"algorithm": "A String", # The algorithm to use for autoscaling.

2971

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2972

},

2973

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2974

"a_key": "A String",

2975

},

2976

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2977

# select a default set of packages which are useful to worker

2978

# harnesses written in a particular language.

2979

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2980

# the service will use the network "default".

2981

},

2982

],

2983

"dataset": "A String", # The dataset for the current project where various workflow

2984

# related tables are stored.

2985

#

2986

# The supported resource type is:

2987

#

2988

# Google BigQuery:

2989

# bigquery.googleapis.com/{dataset}

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2990

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2991

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2992

# callers cannot mutate it.

2993

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2994

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2995

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

2996

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

2997

},

2998

],

2999

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3000

# by the metadata values provided here. Populated for ListJobs and all GetJob

3001

# views SUMMARY and higher.

3002

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3003

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3004

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3005

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3006

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3007

},

3008

],

3009

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3010

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3011

"sdkSupportStatus": "A String", # The support status for this SDK version.

3012

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3013

},

3014

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3015

{ # Metadata for a BigQuery connector used by the job.

3016

"table": "A String", # Table accessed in the connection.

3017

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3018

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3019

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3020

},

3021

],

3022

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3023

{ # Metadata for a File connector used by the job.

3024

"filePattern": "A String", # File Pattern used to access files by the connector.

3025

},

3026

],

3027

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3028

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3029

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3030

"subscription": "A String", # Subscription used in the connection.

3031

},

3032

],

3033

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3034

{ # Metadata for a BigTable connector used by the job.

3035

"projectId": "A String", # ProjectId accessed in the connection.

3036

"instanceId": "A String", # InstanceId accessed in the connection.

3037

"tableId": "A String", # TableId accessed in the connection.

3038

},

3039

],

3040

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3041

{ # Metadata for a Spanner connector used by the job.

3042

"instanceId": "A String", # InstanceId accessed in the connection.

3043

"projectId": "A String", # ProjectId accessed in the connection.

3044

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3045

},

3046

],

3047

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3048

"type": "A String", # The type of Cloud Dataflow job.

3049

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3050

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3051

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3052

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3053

# A description of the user pipeline and stages through which it is executed.

3054

# Created by Cloud Dataflow service. Only retrieved with

3055

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3056

# form. This data is provided by the Dataflow service for ease of visualizing

3057

# the pipeline and interpreting Dataflow provided metrics.

3058

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3059

{ # Description of the composing transforms, names/ids, and input/outputs of a

3060

# stage of execution. Some composing transforms and sources may have been

3061

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3062

"outputSource": [ # Output sources for this stage.

3063

{ # Description of an input or output of an execution stage.

3064

"sizeBytes": "A String", # Size of the source, if measurable.

3065

"name": "A String", # Dataflow service generated name for this source.

3066

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3067

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3068

# source is most closely associated.

3069

},

3070

],

3071

"name": "A String", # Dataflow service generated name for this stage.

3072

"inputSource": [ # Input sources for this stage.

3073

{ # Description of an input or output of an execution stage.

3074

"sizeBytes": "A String", # Size of the source, if measurable.

3075

"name": "A String", # Dataflow service generated name for this source.

3076

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3077

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3078

# source is most closely associated.

3079

},

3080

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3081

"id": "A String", # Dataflow service generated id for this stage.

3082

"componentTransform": [ # Transforms that comprise this execution stage.

3083

{ # Description of a transform executed as part of an execution stage.

3084

"originalTransform": "A String", # User name for the original user transform with which this transform is

3085

# most closely associated.

3086

"name": "A String", # Dataflow service generated name for this source.

3087

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3088

},

3089

],

3090

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3091

{ # Description of an interstitial value between transforms in an execution

3092

# stage.

3093

"name": "A String", # Dataflow service generated name for this source.

3094

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3095

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3096

# source is most closely associated.

3097

},

3098

],

3099

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3100

},

3101

],

3102

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3103

{ # Description of the type, names/ids, and input/outputs for a transform.

3104

"kind": "A String", # Type of transform.

3105

"inputCollectionName": [ # User names for all collection inputs to this transform.

3106

"A String",

3107

],

3108

"name": "A String", # User provided name for this transform instance.

3109

"id": "A String", # SDK generated id of this transform instance.

3110

"displayData": [ # Transform-specific display data.

3111

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3112

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3113

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3114

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3115

# language namespace (i.e. python module) which defines the display data.

3116

# This allows a dax monitoring system to specially handle the data

3117

# and perform custom rendering.

3118

"floatValue": 3.14, # Contains value if the data is of float type.

3119

"key": "A String", # The key identifying the display data.

3120

# This is intended to be used as a label for the display data

3121

# when viewed in a dax monitoring system.

3122

"shortStrValue": "A String", # A possible additional shorter value to display.

3123

# For example a java_class_name_value of com.mypackage.MyDoFn

3124

# will be stored with MyDoFn as the short_str_value and

3125

# com.mypackage.MyDoFn as the java_class_name value.

3126

# short_str_value can be displayed and java_class_name_value

3127

# will be displayed as a tooltip.

3128

"url": "A String", # An optional full URL.

3129

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3130

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3131

"boolValue": True or False, # Contains value if the data is of a boolean type.

3132

"javaClassValue": "A String", # Contains value if the data is of java class type.

3133

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3134

},

3135

],

3136

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3142

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3143

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3144

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3145

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3146

# language namespace (i.e. python module) which defines the display data.

3147

# This allows a dax monitoring system to specially handle the data

3148

# and perform custom rendering.

3149

"floatValue": 3.14, # Contains value if the data is of float type.

3150

"key": "A String", # The key identifying the display data.

3151

# This is intended to be used as a label for the display data

3152

# when viewed in a dax monitoring system.

3153

"shortStrValue": "A String", # A possible additional shorter value to display.

3154

# For example a java_class_name_value of com.mypackage.MyDoFn

3155

# will be stored with MyDoFn as the short_str_value and

3156

# com.mypackage.MyDoFn as the java_class_name value.

3157

# short_str_value can be displayed and java_class_name_value

3158

# will be displayed as a tooltip.

3159

"url": "A String", # An optional full URL.

3160

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3161

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3162

"boolValue": True or False, # Contains value if the data is of a boolean type.

3163

"javaClassValue": "A String", # Contains value if the data is of java class type.

3164

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3169

# of the job it replaced.

3170

#

3171

# When sending a `CreateJobRequest`, you can update a job by specifying it

3172

# here. The job named here is stopped, and its intermediate state is

3173

# transferred to this job.

3174

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3175

# for temporary storage. These temporary files will be

3176

# removed on job completion.

3177

# No duplicates are allowed.

3178

# No file patterns are supported.

3179

#

3180

# The supported files are:

3181

#

3182

# Google Cloud Storage:

3183

#

3184

# storage.googleapis.com/{bucket}/{object}

3185

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3186

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3187

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3188

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3189

#

3190

# Only one Job with a given name may exist in a project at any

3191

# given time. If a caller attempts to create a Job with the same

3192

# name as an already-existing Job, the attempt returns the

3193

# existing Job.

3194

#

3195

# The name must match the regular expression

3196

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3197

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3198

#

3199

# The top-level steps that constitute the entire job.

3200

{ # Defines a particular step within a Cloud Dataflow job.

3201

#

3202

# A job consists of multiple steps, each of which performs some

3203

# specific operation as part of the overall job. Data is typically

3204

# passed from one step to another as part of the job.

3205

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3206

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3207

# Map-Reduce job:

3208

#

3209

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3210

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3211

#

3212

# * Validate the elements.

3213

#

3214

# * Apply a user-defined function to map each element to some value

3215

# and extract an element-specific key value.

3216

#

3217

# * Group elements with the same key into a single element with

3218

# that key, transforming a multiply-keyed collection into a

3219

# uniquely-keyed collection.

3220

#

3221

# * Write the elements out to some data sink.

3222

#

3223

# Note that the Cloud Dataflow service may be used to run many different

3224

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3225

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3226

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3227

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3228

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3229

# predefined step has its own required set of properties.

3230

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3231

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3232

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3233

},

3234

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3235

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3236

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3237

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3238

# isn't contained in the submitted job.

3239

"stages": { # A mapping from each stage to the information about that stage.

3240

"a_key": { # Contains information about how a particular

3241

# google.dataflow.v1beta3.Step will be executed.

3242

"stepName": [ # The steps associated with the execution stage.

3243

# Note that stages may have several steps, and that a given step

3244

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3251

#

3252

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3253

# specified.

3254

#

3255

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3256

# terminal state. After a job has reached a terminal state, no

3257

# further state updates may be made.

3258

#

3259

# This field may be mutated by the Cloud Dataflow service;

3260

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3261

"location": "A String", # The [regional endpoint]

3262

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3263

# contains this job.

3264

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3265

# Flexible resource scheduling jobs are started with some delay after job

3266

# creation, so start_time is unset before start and is updated when the

3267

# job is started by the Cloud Dataflow service. For other jobs, start_time

3268

# always equals to create_time and is immutable and set by the Cloud Dataflow

3269

# service.

3270

"stepsLocation": "A String", # The GCS location where the steps are stored.

3271

"labels": { # User-defined labels for this job.

3272

#

3273

# The labels map can contain no more than 64 entries. Entries of the labels

3274

# map are UTF8 strings that comply with the following restrictions:

3275

#

3276

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3277

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3278

# * Both keys and values are additionally constrained to be <= 128 bytes in

3279

# size.

3280

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3281

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3282

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3283

# Cloud Dataflow service.

3284

"requestedState": "A String", # The job's requested state.

3285

#

3286

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3287

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3288

# also be used to directly set a job's requested state to

3289

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3290

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3291

}

3292

3293

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3300

3301

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3302

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3303

# If this field is set, the service will ensure its uniqueness.

3304

# The request to create a job will fail if the service has knowledge of a

3305

# previously submitted job with the same client's ID and job name.

3306

# The caller may use this field to ensure idempotence of job

3307

# creation across retried attempts to create a job.

3308

# By default, the field is empty and, in that case, the service ignores it.

3309

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3310

#

3311

# This field is set by the Cloud Dataflow service when the Job is

3312

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3313

"currentStateTime": "A String", # The timestamp associated with the current state.

3314

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3315

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3316

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3317

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3318

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3319

"internalExperiments": { # Experimental settings.

3320

"a_key": "", # Properties of the object. Contains field @type with type URL.

3321

},

3322

"workerRegion": "A String", # The Compute Engine region

3323

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3324

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3325

# with worker_zone. If neither worker_region nor worker_zone is specified,

3326

# default to the control plane's region.

3327

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3328

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3329

#

3330

# Format:

3331

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3332

"userAgent": { # A description of the process that generated the request.

3333

"a_key": "", # Properties of the object.

3334

},

3335

"workerZone": "A String", # The Compute Engine zone

3336

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3337

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3338

# with worker_region. If neither worker_region nor worker_zone is specified,

3339

# a zone in the control plane's region is chosen based on available capacity.

3340

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3341

# unspecified, the service will attempt to choose a reasonable

3342

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3343

# e.g. "compute.googleapis.com".

3344

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3345

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3346

# this resource prefix, where {JOBNAME} is the value of the

3347

# job_name field. The resulting bucket and object prefix is used

3348

# as the prefix of the resources used to store temporary data

3349

# needed during the job execution. NOTE: This will override the

3350

# value in taskrunner_settings.

3351

# The supported resource type is:

3352

#

3353

# Google Cloud Storage:

3354

#

3355

# storage.googleapis.com/{bucket}/{object}

3356

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3357

"experiments": [ # The list of experiments to enable.

3358

"A String",

3359

],

3360

"version": { # A structure describing which components and their versions of the service

3361

# are required in order to run the job.

3362

"a_key": "", # Properties of the object.

3363

},

3364

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3365

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

3366

# options are passed through the service and are used to recreate the

3367

# SDK pipeline options on the worker in a language agnostic and platform

3368

# independent way.

3369

"a_key": "", # Properties of the object.

3370

},

3371

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3372

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

3373

# specified in order for the job to have workers.

3374

{ # Describes one particular pool of Cloud Dataflow workers to be

3375

# instantiated by the Cloud Dataflow service in order to perform the

3376

# computations required by a job. Note that a workflow job may use

3377

# multiple pools, in order to match the various computational

3378

# requirements of the various stages of the job.

3379

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

3380

# service will choose a number of threads (according to the number of cores

3381

# on the selected machine type for batch, or 1 by convention for streaming).

3382

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3383

# execute the job. If zero or unspecified, the service will

3384

# attempt to choose a reasonable default.

3385

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

3386

# will attempt to choose a reasonable default.

3387

"diskSourceImage": "A String", # Fully qualified source image for disks.

3388

"packages": [ # Packages to be installed on workers.

3389

{ # The packages that must be installed in order for a worker to run the

3390

# steps of the Cloud Dataflow job that will be assigned to its worker

3391

# pool.

3392

#

3393

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3394

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

3395

# might use this to install jars containing the user's code and all of the

3396

# various dependencies (libraries, data files, etc.) required in order

3397

# for that code to run.

3398

"name": "A String", # The name of the package.

3399

"location": "A String", # The resource to read the package from. The supported resource type is:

3400

#

3401

# Google Cloud Storage:

3402

#

3403

# storage.googleapis.com/{bucket}

3404

# bucket.storage.googleapis.com/

3405

},

3406

],

3407

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

3408

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3409

# `TEARDOWN_NEVER`.

3410

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3411

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3412

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3413

# down.

3414

#

3415

# If the workers are not torn down by the service, they will

3416

# continue to run and use Google Compute Engine VM resources in the

3417

# user's project until they are explicitly terminated by the user.

3418

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3419

# policy except for small, manually supervised test jobs.

3420

#

3421

# If unknown or unspecified, the service will attempt to choose a reasonable

3422

# default.

3423

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3424

# Compute Engine API.

3425

"poolArgs": { # Extra arguments for this worker pool.

3426

"a_key": "", # Properties of the object. Contains field @type with type URL.

3427

},

3428

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

3429

# attempt to choose a reasonable default.

3430

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3431

# harness, residing in Google Container Registry.

3432

#

3433

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3434

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

3435

# attempt to choose a reasonable default.

3436

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3437

# service will attempt to choose a reasonable default.

3438

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3439

# are supported.

3440

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

3441

# only be set in the Fn API path. For non-cross-language pipelines this

3442

# should have only one entry. Cross-language pipelines will have two or more

3443

# entries.

3444

{ # Defines a SDK harness container for executing Dataflow pipelines.

3445

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3446

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

3447

# container instance with this image. If false (or unset) recommends using

3448

# more than one core per SDK container instance with this image for

3449

# efficiency. Note that Dataflow service may choose to override this property

# if needed.

},

],

"dataDisks": [ # Data disks that are used by a VM in this workflow.

3454

{ # Describes the data disk used by a workflow job.

3455

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

3456

# must be a disk type appropriate to the project and zone in which

3457

# the workers will run. If unknown or unspecified, the service

3458

# will attempt to choose a reasonable default.

3459

#

3460

# For example, the standard persistent disk type is a resource name

3461

# typically ending in "pd-standard". If SSD persistent disks are

3462

# available, the resource name typically ends with "pd-ssd". The

3463

# actual valid values are defined the Google Compute Engine API,

3464

# not by the Cloud Dataflow API; consult the Google Compute Engine

3465

# documentation for more information about determining the set of

3466

# available disk types for a particular project and zone.

3467

#

3468

# Google Compute Engine Disk types are local to a particular

3469

# project in a particular zone, and so the resource name will

3470

# typically look something like this:

3471

#

3472

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

3473

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

3474

# attempt to choose a reasonable default.

3475

"mountPoint": "A String", # Directory in a VM where disk is mounted.

3476

},

3477

],

3478

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3479

# the form "regions/REGION/subnetworks/SUBNETWORK".

3480

"ipConfiguration": "A String", # Configuration for VM IPs.

3481

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3482

# using the standard Dataflow task runner. Users should ignore

3483

# this field.

3484

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3485

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3486

# taskrunner; e.g. "wheel".

3487

"harnessCommand": "A String", # The command to launch the worker harness.

3488

"logDir": "A String", # The directory on the VM to store logs.

3489

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3490

# access the Cloud Dataflow API.

3491

"A String",

3492

],

3493

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3494

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3495

# will not be uploaded.

3496

#

3497

# The supported resource type is:

3498

#

3499

# Google Cloud Storage:

3500

# storage.googleapis.com/{bucket}/{object}

3501

# bucket.storage.googleapis.com/{object}

3502

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3503

"workflowFileName": "A String", # The file to store the workflow in.

3504

"languageHint": "A String", # The suggested backend language.

3505

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3506

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3507

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3508

# temporary storage.

3509

#

3510

# The supported resource type is:

3511

#

3512

# Google Cloud Storage:

3513

# storage.googleapis.com/{bucket}/{object}

3514

# bucket.storage.googleapis.com/{object}

3515

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3516

#

3517

# When workers access Google Cloud APIs, they logically do so via

3518

# relative URLs. If this field is specified, it supplies the base

3519

# URL to use for resolving these relative URLs. The normative

3520

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3521

# Locators".

3522

#

3523

# If not specified, the default value is "http://www.googleapis.com/"

3524

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3525

# console.

3526

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3527

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3528

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3529

# storage.

3530

#

3531

# The supported resource type is:

3532

#

3533

# Google Cloud Storage:

3534

#

3535

# storage.googleapis.com/{bucket}/{object}

3536

# bucket.storage.googleapis.com/{object}

3537

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3538

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3539

#

3540

# When workers access Google Cloud APIs, they logically do so via

3541

# relative URLs. If this field is specified, it supplies the base

3542

# URL to use for resolving these relative URLs. The normative

3543

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3544

# Locators".

3545

#

3546

# If not specified, the default value is "http://www.googleapis.com/"

3547

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3548

# "dataflow/v1b3/projects".

3549

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3550

# "shuffle/v1beta1".

3551

"workerId": "A String", # The ID of the worker running this pipeline.

3552

},

3553

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3554

# taskrunner; e.g. "root".

3555

"vmId": "A String", # The ID string of the VM.

3556

},

3557

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3558

"algorithm": "A String", # The algorithm to use for autoscaling.

3559

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3560

},

3561

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3562

"a_key": "A String",

3563

},

3564

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3565

# select a default set of packages which are useful to worker

3566

# harnesses written in a particular language.

3567

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3568

# the service will use the network "default".

3569

},

3570

],

3571

"dataset": "A String", # The dataset for the current project where various workflow

3572

# related tables are stored.

3573

#

3574

# The supported resource type is:

3575

#

3576

# Google BigQuery:

3577

# bigquery.googleapis.com/{dataset}

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3578

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3579

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3580

# callers cannot mutate it.

3581

{ # A message describing the state of a particular execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3582

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3583

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3584

"executionStageName": "A String", # The name of the execution stage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3585

},

3586

],

3587

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3588

# by the metadata values provided here. Populated for ListJobs and all GetJob

3589

# views SUMMARY and higher.

3590

# ListJob response and Job SUMMARY view.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3591

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3592

{ # Metadata for a Datastore connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3593

"namespace": "A String", # Namespace used in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3594

"projectId": "A String", # ProjectId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3595

},

3596

],

3597

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3598

"version": "A String", # The version of the SDK used to run the job.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3599

"sdkSupportStatus": "A String", # The support status for this SDK version.

3600

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3601

},

3602

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3603

{ # Metadata for a BigQuery connector used by the job.

3604

"table": "A String", # Table accessed in the connection.

3605

"dataset": "A String", # Dataset accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3606

"query": "A String", # Query used to access data in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3607

"projectId": "A String", # Project accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3608

},

3609

],

3610

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3611

{ # Metadata for a File connector used by the job.

3612

"filePattern": "A String", # File Pattern used to access files by the connector.

3613

},

3614

],

3615

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3616

{ # Metadata for a PubSub connector used by the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3617

"topic": "A String", # Topic accessed in the connection.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3618

"subscription": "A String", # Subscription used in the connection.

3619

},

3620

],

3621

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3622

{ # Metadata for a BigTable connector used by the job.

3623

"projectId": "A String", # ProjectId accessed in the connection.

3624

"instanceId": "A String", # InstanceId accessed in the connection.

3625

"tableId": "A String", # TableId accessed in the connection.

3626

},

3627

],

3628

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3629

{ # Metadata for a Spanner connector used by the job.

3630

"instanceId": "A String", # InstanceId accessed in the connection.

3631

"projectId": "A String", # ProjectId accessed in the connection.

3632

"databaseId": "A String", # DatabaseId accessed in the connection.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3633

},

3634

],

3635

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3636

"type": "A String", # The type of Cloud Dataflow job.

3637

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3638

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3639

# snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3640

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3641

# A description of the user pipeline and stages through which it is executed.

3642

# Created by Cloud Dataflow service. Only retrieved with

3643

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3644

# form. This data is provided by the Dataflow service for ease of visualizing

3645

# the pipeline and interpreting Dataflow provided metrics.

3646

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3647

{ # Description of the composing transforms, names/ids, and input/outputs of a

3648

# stage of execution. Some composing transforms and sources may have been

3649

# generated by the Dataflow service during execution planning.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3650

"outputSource": [ # Output sources for this stage.

3651

{ # Description of an input or output of an execution stage.

3652

"sizeBytes": "A String", # Size of the source, if measurable.

3653

"name": "A String", # Dataflow service generated name for this source.

3654

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3655

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3656

# source is most closely associated.

3657

},

3658

],

3659

"name": "A String", # Dataflow service generated name for this stage.

3660

"inputSource": [ # Input sources for this stage.

3661

{ # Description of an input or output of an execution stage.

3662

"sizeBytes": "A String", # Size of the source, if measurable.

3663

"name": "A String", # Dataflow service generated name for this source.

3664

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3665

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3666

# source is most closely associated.

3667

},

3668

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3669

"id": "A String", # Dataflow service generated id for this stage.

3670

"componentTransform": [ # Transforms that comprise this execution stage.

3671

{ # Description of a transform executed as part of an execution stage.

3672

"originalTransform": "A String", # User name for the original user transform with which this transform is

3673

# most closely associated.

3674

"name": "A String", # Dataflow service generated name for this source.

3675

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3676

},

3677

],

3678

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3679

{ # Description of an interstitial value between transforms in an execution

3680

# stage.

3681

"name": "A String", # Dataflow service generated name for this source.

3682

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3683

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3684

# source is most closely associated.

3685

},

3686

],

3687

"kind": "A String", # Type of tranform this stage is executing.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3688

},

3689

],

3690

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3691

{ # Description of the type, names/ids, and input/outputs for a transform.

3692

"kind": "A String", # Type of transform.

3693

"inputCollectionName": [ # User names for all collection inputs to this transform.

3694

"A String",

3695

],

3696

"name": "A String", # User provided name for this transform instance.

3697

"id": "A String", # SDK generated id of this transform instance.

3698

"displayData": [ # Transform-specific display data.

3699

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3700

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3701

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3702

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3703

# language namespace (i.e. python module) which defines the display data.

3704

# This allows a dax monitoring system to specially handle the data

3705

# and perform custom rendering.

3706

"floatValue": 3.14, # Contains value if the data is of float type.

3707

"key": "A String", # The key identifying the display data.

3708

# This is intended to be used as a label for the display data

3709

# when viewed in a dax monitoring system.

3710

"shortStrValue": "A String", # A possible additional shorter value to display.

3711

# For example a java_class_name_value of com.mypackage.MyDoFn

3712

# will be stored with MyDoFn as the short_str_value and

3713

# com.mypackage.MyDoFn as the java_class_name value.

3714

# short_str_value can be displayed and java_class_name_value

3715

# will be displayed as a tooltip.

3716

"url": "A String", # An optional full URL.

3717

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3718

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3719

"boolValue": True or False, # Contains value if the data is of a boolean type.

3720

"javaClassValue": "A String", # Contains value if the data is of java class type.

3721

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3722

},

3723

],

3724

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3730

{ # Data provided with a pipeline or transform to provide descriptive info.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3731

"durationValue": "A String", # Contains value if the data is of duration type.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3732

"int64Value": "A String", # Contains value if the data is of int64 type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3733

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3734

# language namespace (i.e. python module) which defines the display data.

3735

# This allows a dax monitoring system to specially handle the data

3736

# and perform custom rendering.

3737

"floatValue": 3.14, # Contains value if the data is of float type.

3738

"key": "A String", # The key identifying the display data.

3739

# This is intended to be used as a label for the display data

3740

# when viewed in a dax monitoring system.

3741

"shortStrValue": "A String", # A possible additional shorter value to display.

3742

# For example a java_class_name_value of com.mypackage.MyDoFn

3743

# will be stored with MyDoFn as the short_str_value and

3744

# com.mypackage.MyDoFn as the java_class_name value.

3745

# short_str_value can be displayed and java_class_name_value

3746

# will be displayed as a tooltip.

3747

"url": "A String", # An optional full URL.

3748

"label": "A String", # An optional label to display in a dax UI for the element.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame^]

3749

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3750

"boolValue": True or False, # Contains value if the data is of a boolean type.

3751

"javaClassValue": "A String", # Contains value if the data is of java class type.

3752

"strValue": "A String", # Contains value if the data is of string type.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3757

# of the job it replaced.

3758

#

3759

# When sending a `CreateJobRequest`, you can update a job by specifying it

3760

# here. The job named here is stopped, and its intermediate state is

3761

# transferred to this job.

3762

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3763

# for temporary storage. These temporary files will be

3764

# removed on job completion.

3765

# No duplicates are allowed.

3766

# No file patterns are supported.

3767

#

3768

# The supported files are:

3769

#

3770

# Google Cloud Storage:

3771

#

3772

# storage.googleapis.com/{bucket}/{object}

3773

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3774

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3775

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3776

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3777

#

3778

# Only one Job with a given name may exist in a project at any

3779

# given time. If a caller attempts to create a Job with the same

3780

# name as an already-existing Job, the attempt returns the

3781

# existing Job.

3782

#

3783

# The name must match the regular expression

3784

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3785

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3786

#

3787

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3788

{ # Defines a particular step within a Cloud Dataflow job.

3789

#

3790

# A job consists of multiple steps, each of which performs some

3791

# specific operation as part of the overall job. Data is typically

3792

# passed from one step to another as part of the job.

3793

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3794

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3795

# Map-Reduce job:

3796

#

3797

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3798

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3799

#

3800

# * Validate the elements.

3801

#

3802

# * Apply a user-defined function to map each element to some value

3803

# and extract an element-specific key value.

3804

#

3805

# * Group elements with the same key into a single element with

3806

# that key, transforming a multiply-keyed collection into a

3807

# uniquely-keyed collection.

3808

#

3809

# * Write the elements out to some data sink.

3810

#

3811

# Note that the Cloud Dataflow service may be used to run many different

3812

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3813

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3814

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3815

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3816

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3817

# predefined step has its own required set of properties.

3818

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3819

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3820

},

3821

},

3822

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3823

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3824

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3825

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3826

# isn't contained in the submitted job.

3827

"stages": { # A mapping from each stage to the information about that stage.

3828

"a_key": { # Contains information about how a particular

3829

# google.dataflow.v1beta3.Step will be executed.

3830

"stepName": [ # The steps associated with the execution stage.

3831

# Note that stages may have several steps, and that a given step

3832

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3839

#

3840

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3841

# specified.

3842

#

3843

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3844

# terminal state. After a job has reached a terminal state, no

3845

# further state updates may be made.

3846

#

3847

# This field may be mutated by the Cloud Dataflow service;

3848

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3849

"location": "A String", # The [regional endpoint]

3850

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3851

# contains this job.

3852

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3853

# Flexible resource scheduling jobs are started with some delay after job

3854

# creation, so start_time is unset before start and is updated when the

3855

# job is started by the Cloud Dataflow service. For other jobs, start_time

3856

# always equals to create_time and is immutable and set by the Cloud Dataflow

3857

# service.

3858

"stepsLocation": "A String", # The GCS location where the steps are stored.

3859

"labels": { # User-defined labels for this job.

3860

#

3861

# The labels map can contain no more than 64 entries. Entries of the labels

3862

# map are UTF8 strings that comply with the following restrictions:

3863

#

3864

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3865

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3866

# * Both keys and values are additionally constrained to be <= 128 bytes in

3867

# size.

3868

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3869

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3870

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3871

# Cloud Dataflow service.

3872

"requestedState": "A String", # The job's requested state.

3873

#

3874

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3875

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3876

# also be used to directly set a job's requested state to

3877

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3878

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3879

}</pre>

Jon Wayne Parrott