Blame - docs/dyn/dataflow_v1b3.projects.locations.jobs.html - platform/external/python/google-api-python-client

2017-01-06 09:58:29 -0800

[diff] [blame]

76

<h2>Instance Methods</h2>

77

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

78

<code><a href="dataflow_v1b3.projects.locations.jobs.debug.html">debug()</a></code>

79

</p>

80

<p class="firstline">Returns the debug Resource.</p>

81

82

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

83

<code><a href="dataflow_v1b3.projects.locations.jobs.messages.html">messages()</a></code>

84

</p>

85

<p class="firstline">Returns the messages Resource.</p>

86

87

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

88

<code><a href="dataflow_v1b3.projects.locations.jobs.snapshots.html">snapshots()</a></code>

89

</p>

90

<p class="firstline">Returns the snapshots Resource.</p>

91

92

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

93

<code><a href="dataflow_v1b3.projects.locations.jobs.workItems.html">workItems()</a></code>

94

</p>

95

<p class="firstline">Returns the workItems Resource.</p>

96

97

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

98

<code><a href="#create">create(projectId, location, body=None, view=None, replaceJobId=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

99

<p class="firstline">Creates a Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

100

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

101

<code><a href="#get">get(projectId, location, jobId, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

102

<p class="firstline">Gets the state of the specified Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

103

104

<code><a href="#getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</a></code></p>

105

<p class="firstline">Request the job status.</p>

106

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

107

<code><a href="#list">list(projectId, location, filter=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

108

<p class="firstline">List the jobs of a project.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

109

110

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

111

<p class="firstline">Retrieves the next page of results.</p>

112

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

113

<code><a href="#snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

114

<p class="firstline">Snapshot the state of a streaming job.</p>

115

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

116

<code><a href="#update">update(projectId, location, jobId, body=None, x__xgafv=None)</a></code></p>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

117

<p class="firstline">Updates the state of an existing Cloud Dataflow job.</p>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

118

<h3>Method Details</h3>

119

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

120

<code class="details" id="create">create(projectId, location, body=None, view=None, replaceJobId=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

121

<pre>Creates a Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

122

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

123

To create a job, we recommend using `projects.locations.jobs.create` with a

124

[regional endpoint]

125

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

126

`projects.jobs.create` is not recommended, as your job will always start

127

in `us-central1`.

128

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

129

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

130

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

131

location: string, The [regional endpoint]

132

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

133

contains this job. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

134

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

135

The object takes the form of:

136

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

137

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

138

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

139

# If this field is set, the service will ensure its uniqueness.

140

# The request to create a job will fail if the service has knowledge of a

141

# previously submitted job with the same client's ID and job name.

142

# The caller may use this field to ensure idempotence of job

143

# creation across retried attempts to create a job.

144

# By default, the field is empty and, in that case, the service ignores it.

145

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

146

#

147

# This field is set by the Cloud Dataflow service when the Job is

148

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

149

"currentStateTime": "A String", # The timestamp associated with the current state.

150

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

151

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

152

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

153

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

154

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

155

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

156

# options are passed through the service and are used to recreate the

157

# SDK pipeline options on the worker in a language agnostic and platform

158

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

159

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

160

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

161

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

162

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

163

# specified in order for the job to have workers.

164

{ # Describes one particular pool of Cloud Dataflow workers to be

165

# instantiated by the Cloud Dataflow service in order to perform the

166

# computations required by a job. Note that a workflow job may use

167

# multiple pools, in order to match the various computational

168

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

169

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

170

# select a default set of packages which are useful to worker

171

# harnesses written in a particular language.

172

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

173

# the service will use the network "default".

174

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

175

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

176

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

177

# execute the job. If zero or unspecified, the service will

178

# attempt to choose a reasonable default.

179

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

180

# service will choose a number of threads (according to the number of cores

181

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

182

"diskSourceImage": "A String", # Fully qualified source image for disks.

183

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

184

{ # The packages that must be installed in order for a worker to run the

185

# steps of the Cloud Dataflow job that will be assigned to its worker

186

# pool.

187

#

188

# This is the mechanism by which the Cloud Dataflow SDK causes code to

189

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

190

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

191

# various dependencies (libraries, data files, etc.) required in order

192

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

193

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

194

#

195

# Google Cloud Storage:

196

#

197

# storage.googleapis.com/{bucket}

198

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

199

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

200

},

201

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

202

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

203

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

204

# `TEARDOWN_NEVER`.

205

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

206

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

207

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

208

# down.

209

#

210

# If the workers are not torn down by the service, they will

211

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

212

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

213

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

214

# policy except for small, manually supervised test jobs.

215

#

216

# If unknown or unspecified, the service will attempt to choose a reasonable

217

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

218

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

219

# Compute Engine API.

220

"poolArgs": { # Extra arguments for this worker pool.

221

"a_key": "", # Properties of the object. Contains field @type with type URL.

222

},

223

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

224

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

225

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

226

# harness, residing in Google Container Registry.

227

#

228

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

229

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

230

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

231

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

232

# service will attempt to choose a reasonable default.

233

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

234

# are supported.

235

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

236

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

237

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

238

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

239

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

240

# must be a disk type appropriate to the project and zone in which

241

# the workers will run. If unknown or unspecified, the service

242

# will attempt to choose a reasonable default.

243

#

244

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

245

# typically ending in "pd-standard". If SSD persistent disks are

246

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

247

# actual valid values are defined the Google Compute Engine API,

248

# not by the Cloud Dataflow API; consult the Google Compute Engine

249

# documentation for more information about determining the set of

250

# available disk types for a particular project and zone.

251

#

252

# Google Compute Engine Disk types are local to a particular

253

# project in a particular zone, and so the resource name will

254

# typically look something like this:

255

#

256

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

257

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

258

},

259

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

260

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

261

# only be set in the Fn API path. For non-cross-language pipelines this

262

# should have only one entry. Cross-language pipelines will have two or more

263

# entries.

264

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

265

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

266

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

267

# container instance with this image. If false (or unset) recommends using

268

# more than one core per SDK container instance with this image for

269

# efficiency. Note that Dataflow service may choose to override this property

270

# if needed.

271

},

272

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

273

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

274

# the form "regions/REGION/subnetworks/SUBNETWORK".

275

"ipConfiguration": "A String", # Configuration for VM IPs.

276

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

277

# using the standard Dataflow task runner. Users should ignore

278

# this field.

279

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

280

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

281

# taskrunner; e.g. "wheel".

282

"harnessCommand": "A String", # The command to launch the worker harness.

283

"logDir": "A String", # The directory on the VM to store logs.

284

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

285

# access the Cloud Dataflow API.

286

"A String",

287

],

288

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

289

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

290

# will not be uploaded.

291

#

292

# The supported resource type is:

293

#

294

# Google Cloud Storage:

295

# storage.googleapis.com/{bucket}/{object}

296

# bucket.storage.googleapis.com/{object}

297

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

298

"workflowFileName": "A String", # The file to store the workflow in.

299

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

300

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

301

# temporary storage.

302

#

303

# The supported resource type is:

304

#

305

# Google Cloud Storage:

306

# storage.googleapis.com/{bucket}/{object}

307

# bucket.storage.googleapis.com/{object}

308

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

309

"languageHint": "A String", # The suggested backend language.

310

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

311

#

312

# When workers access Google Cloud APIs, they logically do so via

313

# relative URLs. If this field is specified, it supplies the base

314

# URL to use for resolving these relative URLs. The normative

315

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

316

# Locators".

317

#

318

# If not specified, the default value is "http://www.googleapis.com/"

319

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

320

# console.

321

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

322

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

323

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

324

#

325

# When workers access Google Cloud APIs, they logically do so via

326

# relative URLs. If this field is specified, it supplies the base

327

# URL to use for resolving these relative URLs. The normative

328

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

329

# Locators".

330

#

331

# If not specified, the default value is "http://www.googleapis.com/"

332

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

333

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

334

# "dataflow/v1b3/projects".

335

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

336

# "shuffle/v1beta1".

337

"workerId": "A String", # The ID of the worker running this pipeline.

338

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

339

# storage.

340

#

341

# The supported resource type is:

342

#

343

# Google Cloud Storage:

344

#

345

# storage.googleapis.com/{bucket}/{object}

346

# bucket.storage.googleapis.com/{object}

347

},

348

"vmId": "A String", # The ID string of the VM.

349

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

350

# taskrunner; e.g. "root".

351

},

352

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

353

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

354

"algorithm": "A String", # The algorithm to use for autoscaling.

355

},

356

"metadata": { # Metadata to set on the Google Compute Engine VMs.

357

"a_key": "A String",

358

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

359

},

360

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

361

"dataset": "A String", # The dataset for the current project where various workflow

362

# related tables are stored.

363

#

364

# The supported resource type is:

365

#

366

# Google BigQuery:

367

# bigquery.googleapis.com/{dataset}

368

"internalExperiments": { # Experimental settings.

369

"a_key": "", # Properties of the object. Contains field @type with type URL.

370

},

371

"workerRegion": "A String", # The Compute Engine region

372

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

373

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

374

# with worker_zone. If neither worker_region nor worker_zone is specified,

375

# default to the control plane's region.

376

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

377

# at rest, AKA a Customer Managed Encryption Key (CMEK).

378

#

379

# Format:

380

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

381

"userAgent": { # A description of the process that generated the request.

382

"a_key": "", # Properties of the object.

383

},

384

"workerZone": "A String", # The Compute Engine zone

385

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

386

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

387

# with worker_region. If neither worker_region nor worker_zone is specified,

388

# a zone in the control plane's region is chosen based on available capacity.

389

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

390

# unspecified, the service will attempt to choose a reasonable

391

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

392

# e.g. "compute.googleapis.com".

393

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

394

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

395

# this resource prefix, where {JOBNAME} is the value of the

396

# job_name field. The resulting bucket and object prefix is used

397

# as the prefix of the resources used to store temporary data

398

# needed during the job execution. NOTE: This will override the

399

# value in taskrunner_settings.

400

# The supported resource type is:

401

#

402

# Google Cloud Storage:

403

#

404

# storage.googleapis.com/{bucket}/{object}

405

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

406

"experiments": [ # The list of experiments to enable.

407

"A String",

408

],

409

"version": { # A structure describing which components and their versions of the service

410

# are required in order to run the job.

411

"a_key": "", # Properties of the object.

412

},

413

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

414

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

415

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

416

# callers cannot mutate it.

417

{ # A message describing the state of a particular execution stage.

418

"executionStageName": "A String", # The name of the execution stage.

419

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

420

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

421

},

422

],

423

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

424

# by the metadata values provided here. Populated for ListJobs and all GetJob

425

# views SUMMARY and higher.

426

# ListJob response and Job SUMMARY view.

427

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

428

{ # Metadata for a BigTable connector used by the job.

429

"tableId": "A String", # TableId accessed in the connection.

430

"projectId": "A String", # ProjectId accessed in the connection.

431

"instanceId": "A String", # InstanceId accessed in the connection.

432

},

433

],

434

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

435

{ # Metadata for a Spanner connector used by the job.

436

"databaseId": "A String", # DatabaseId accessed in the connection.

437

"instanceId": "A String", # InstanceId accessed in the connection.

438

"projectId": "A String", # ProjectId accessed in the connection.

439

},

440

],

441

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

442

{ # Metadata for a Datastore connector used by the job.

443

"projectId": "A String", # ProjectId accessed in the connection.

444

"namespace": "A String", # Namespace used in the connection.

445

},

446

],

447

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

448

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

449

"sdkSupportStatus": "A String", # The support status for this SDK version.

450

"version": "A String", # The version of the SDK used to run the job.

451

},

452

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

453

{ # Metadata for a BigQuery connector used by the job.

454

"table": "A String", # Table accessed in the connection.

455

"dataset": "A String", # Dataset accessed in the connection.

456

"projectId": "A String", # Project accessed in the connection.

457

"query": "A String", # Query used to access data in the connection.

458

},

459

],

460

"fileDetails": [ # Identification of a File source used in the Dataflow job.

461

{ # Metadata for a File connector used by the job.

462

"filePattern": "A String", # File Pattern used to access files by the connector.

463

},

464

],

465

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

466

{ # Metadata for a PubSub connector used by the job.

467

"subscription": "A String", # Subscription used in the connection.

468

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

473

# snapshot.

474

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

475

"type": "A String", # The type of Cloud Dataflow job.

476

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

477

# A description of the user pipeline and stages through which it is executed.

478

# Created by Cloud Dataflow service. Only retrieved with

479

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

480

# form. This data is provided by the Dataflow service for ease of visualizing

481

# the pipeline and interpreting Dataflow provided metrics.

482

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

483

{ # Description of the composing transforms, names/ids, and input/outputs of a

484

# stage of execution. Some composing transforms and sources may have been

485

# generated by the Dataflow service during execution planning.

486

"id": "A String", # Dataflow service generated id for this stage.

487

"componentTransform": [ # Transforms that comprise this execution stage.

488

{ # Description of a transform executed as part of an execution stage.

489

"originalTransform": "A String", # User name for the original user transform with which this transform is

490

# most closely associated.

491

"name": "A String", # Dataflow service generated name for this source.

492

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

493

},

494

],

495

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

496

{ # Description of an interstitial value between transforms in an execution

497

# stage.

498

"name": "A String", # Dataflow service generated name for this source.

499

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

500

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

501

# source is most closely associated.

502

},

503

],

504

"kind": "A String", # Type of tranform this stage is executing.

505

"outputSource": [ # Output sources for this stage.

506

{ # Description of an input or output of an execution stage.

507

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

508

# source is most closely associated.

509

"name": "A String", # Dataflow service generated name for this source.

510

"sizeBytes": "A String", # Size of the source, if measurable.

511

"userName": "A String", # Human-readable name for this source; may be user or system generated.

512

},

513

],

514

"name": "A String", # Dataflow service generated name for this stage.

515

"inputSource": [ # Input sources for this stage.

516

{ # Description of an input or output of an execution stage.

517

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

518

# source is most closely associated.

519

"name": "A String", # Dataflow service generated name for this source.

520

"sizeBytes": "A String", # Size of the source, if measurable.

521

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

527

{ # Description of the type, names/ids, and input/outputs for a transform.

528

"kind": "A String", # Type of transform.

529

"inputCollectionName": [ # User names for all collection inputs to this transform.

530

"A String",

531

],

532

"name": "A String", # User provided name for this transform instance.

533

"id": "A String", # SDK generated id of this transform instance.

534

"displayData": [ # Transform-specific display data.

535

{ # Data provided with a pipeline or transform to provide descriptive info.

536

"timestampValue": "A String", # Contains value if the data is of timestamp type.

537

"boolValue": True or False, # Contains value if the data is of a boolean type.

538

"javaClassValue": "A String", # Contains value if the data is of java class type.

539

"strValue": "A String", # Contains value if the data is of string type.

540

"int64Value": "A String", # Contains value if the data is of int64 type.

541

"durationValue": "A String", # Contains value if the data is of duration type.

542

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

543

# language namespace (i.e. python module) which defines the display data.

544

# This allows a dax monitoring system to specially handle the data

545

# and perform custom rendering.

546

"floatValue": 3.14, # Contains value if the data is of float type.

547

"key": "A String", # The key identifying the display data.

548

# This is intended to be used as a label for the display data

549

# when viewed in a dax monitoring system.

550

"shortStrValue": "A String", # A possible additional shorter value to display.

551

# For example a java_class_name_value of com.mypackage.MyDoFn

552

# will be stored with MyDoFn as the short_str_value and

553

# com.mypackage.MyDoFn as the java_class_name value.

554

# short_str_value can be displayed and java_class_name_value

555

# will be displayed as a tooltip.

556

"url": "A String", # An optional full URL.

557

"label": "A String", # An optional label to display in a dax UI for the element.

558

},

559

],

560

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

566

{ # Data provided with a pipeline or transform to provide descriptive info.

567

"timestampValue": "A String", # Contains value if the data is of timestamp type.

568

"boolValue": True or False, # Contains value if the data is of a boolean type.

569

"javaClassValue": "A String", # Contains value if the data is of java class type.

570

"strValue": "A String", # Contains value if the data is of string type.

571

"int64Value": "A String", # Contains value if the data is of int64 type.

572

"durationValue": "A String", # Contains value if the data is of duration type.

573

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

574

# language namespace (i.e. python module) which defines the display data.

575

# This allows a dax monitoring system to specially handle the data

576

# and perform custom rendering.

577

"floatValue": 3.14, # Contains value if the data is of float type.

578

"key": "A String", # The key identifying the display data.

579

# This is intended to be used as a label for the display data

580

# when viewed in a dax monitoring system.

581

"shortStrValue": "A String", # A possible additional shorter value to display.

582

# For example a java_class_name_value of com.mypackage.MyDoFn

583

# will be stored with MyDoFn as the short_str_value and

584

# com.mypackage.MyDoFn as the java_class_name value.

585

# short_str_value can be displayed and java_class_name_value

586

# will be displayed as a tooltip.

587

"url": "A String", # An optional full URL.

588

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

593

# of the job it replaced.

594

#

595

# When sending a `CreateJobRequest`, you can update a job by specifying it

596

# here. The job named here is stopped, and its intermediate state is

597

# transferred to this job.

598

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

599

# for temporary storage. These temporary files will be

600

# removed on job completion.

601

# No duplicates are allowed.

602

# No file patterns are supported.

603

#

604

# The supported files are:

605

#

606

# Google Cloud Storage:

607

#

608

# storage.googleapis.com/{bucket}/{object}

609

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

610

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

611

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

612

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

613

#

614

# Only one Job with a given name may exist in a project at any

615

# given time. If a caller attempts to create a Job with the same

616

# name as an already-existing Job, the attempt returns the

617

# existing Job.

618

#

619

# The name must match the regular expression

620

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

621

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

622

#

623

# The top-level steps that constitute the entire job.

624

{ # Defines a particular step within a Cloud Dataflow job.

625

#

626

# A job consists of multiple steps, each of which performs some

627

# specific operation as part of the overall job. Data is typically

628

# passed from one step to another as part of the job.

629

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

630

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

631

# Map-Reduce job:

632

#

633

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

634

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

635

#

636

# * Validate the elements.

637

#

638

# * Apply a user-defined function to map each element to some value

639

# and extract an element-specific key value.

640

#

641

# * Group elements with the same key into a single element with

642

# that key, transforming a multiply-keyed collection into a

643

# uniquely-keyed collection.

644

#

645

# * Write the elements out to some data sink.

646

#

647

# Note that the Cloud Dataflow service may be used to run many different

648

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

649

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

650

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

651

"kind": "A String", # The kind of step in the Cloud Dataflow job.

652

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

653

# predefined step has its own required set of properties.

654

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

655

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

656

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

657

},

658

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

659

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

660

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

661

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

662

# isn't contained in the submitted job.

663

"stages": { # A mapping from each stage to the information about that stage.

664

"a_key": { # Contains information about how a particular

665

# google.dataflow.v1beta3.Step will be executed.

666

"stepName": [ # The steps associated with the execution stage.

667

# Note that stages may have several steps, and that a given step

668

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

675

#

676

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

677

# specified.

678

#

679

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

680

# terminal state. After a job has reached a terminal state, no

681

# further state updates may be made.

682

#

683

# This field may be mutated by the Cloud Dataflow service;

684

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

685

"location": "A String", # The [regional endpoint]

686

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

687

# contains this job.

688

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

689

# Flexible resource scheduling jobs are started with some delay after job

690

# creation, so start_time is unset before start and is updated when the

691

# job is started by the Cloud Dataflow service. For other jobs, start_time

692

# always equals to create_time and is immutable and set by the Cloud Dataflow

693

# service.

694

"stepsLocation": "A String", # The GCS location where the steps are stored.

695

"labels": { # User-defined labels for this job.

696

#

697

# The labels map can contain no more than 64 entries. Entries of the labels

698

# map are UTF8 strings that comply with the following restrictions:

699

#

700

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

701

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

702

# * Both keys and values are additionally constrained to be <= 128 bytes in

703

# size.

704

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

705

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

706

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

707

# Cloud Dataflow service.

708

"requestedState": "A String", # The job's requested state.

709

#

710

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

711

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

712

# also be used to directly set a job's requested state to

713

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

714

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

715

}

716

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

717

view: string, The level of information requested in response.

718

replaceJobId: string, Deprecated. This field is now in the Job message.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

719

x__xgafv: string, V1 error format.

720

Allowed values

721

1 - v1 error format

722

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

723

724

Returns:

725

An object of the form:

726

727

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

728

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

729

# If this field is set, the service will ensure its uniqueness.

730

# The request to create a job will fail if the service has knowledge of a

731

# previously submitted job with the same client's ID and job name.

732

# The caller may use this field to ensure idempotence of job

733

# creation across retried attempts to create a job.

734

# By default, the field is empty and, in that case, the service ignores it.

735

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

736

#

737

# This field is set by the Cloud Dataflow service when the Job is

738

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

739

"currentStateTime": "A String", # The timestamp associated with the current state.

740

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

741

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

742

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

743

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

744

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

745

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

746

# options are passed through the service and are used to recreate the

747

# SDK pipeline options on the worker in a language agnostic and platform

748

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

749

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

750

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

751

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

752

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

753

# specified in order for the job to have workers.

754

{ # Describes one particular pool of Cloud Dataflow workers to be

755

# instantiated by the Cloud Dataflow service in order to perform the

756

# computations required by a job. Note that a workflow job may use

757

# multiple pools, in order to match the various computational

758

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

759

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

760

# select a default set of packages which are useful to worker

761

# harnesses written in a particular language.

762

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

763

# the service will use the network "default".

764

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

765

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

766

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

767

# execute the job. If zero or unspecified, the service will

768

# attempt to choose a reasonable default.

769

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

770

# service will choose a number of threads (according to the number of cores

771

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

772

"diskSourceImage": "A String", # Fully qualified source image for disks.

773

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

774

{ # The packages that must be installed in order for a worker to run the

775

# steps of the Cloud Dataflow job that will be assigned to its worker

776

# pool.

777

#

778

# This is the mechanism by which the Cloud Dataflow SDK causes code to

779

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

780

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

781

# various dependencies (libraries, data files, etc.) required in order

782

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

783

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

784

#

785

# Google Cloud Storage:

786

#

787

# storage.googleapis.com/{bucket}

788

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

789

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

790

},

791

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

792

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

793

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

794

# `TEARDOWN_NEVER`.

795

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

796

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

797

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

798

# down.

799

#

800

# If the workers are not torn down by the service, they will

801

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

802

# user's project until they are explicitly terminated by the user.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

803

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

804

# policy except for small, manually supervised test jobs.

805

#

806

# If unknown or unspecified, the service will attempt to choose a reasonable

807

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

808

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

809

# Compute Engine API.

810

"poolArgs": { # Extra arguments for this worker pool.

811

"a_key": "", # Properties of the object. Contains field @type with type URL.

812

},

813

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

814

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

815

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

816

# harness, residing in Google Container Registry.

817

#

818

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

819

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

820

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

821

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

822

# service will attempt to choose a reasonable default.

823

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

824

# are supported.

825

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

826

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

827

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

828

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

829

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

830

# must be a disk type appropriate to the project and zone in which

831

# the workers will run. If unknown or unspecified, the service

832

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

833

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

834

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

835

# typically ending in "pd-standard". If SSD persistent disks are

836

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

837

# actual valid values are defined the Google Compute Engine API,

838

# not by the Cloud Dataflow API; consult the Google Compute Engine

839

# documentation for more information about determining the set of

840

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

841

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

842

# Google Compute Engine Disk types are local to a particular

843

# project in a particular zone, and so the resource name will

844

# typically look something like this:

845

#

846

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

847

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

848

},

849

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

850

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

851

# only be set in the Fn API path. For non-cross-language pipelines this

852

# should have only one entry. Cross-language pipelines will have two or more

853

# entries.

854

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

855

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

856

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

857

# container instance with this image. If false (or unset) recommends using

858

# more than one core per SDK container instance with this image for

859

# efficiency. Note that Dataflow service may choose to override this property

860

# if needed.

861

},

862

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

863

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

864

# the form "regions/REGION/subnetworks/SUBNETWORK".

865

"ipConfiguration": "A String", # Configuration for VM IPs.

866

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

867

# using the standard Dataflow task runner. Users should ignore

868

# this field.

869

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

870

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

871

# taskrunner; e.g. "wheel".

872

"harnessCommand": "A String", # The command to launch the worker harness.

873

"logDir": "A String", # The directory on the VM to store logs.

874

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

875

# access the Cloud Dataflow API.

876

"A String",

877

],

878

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

879

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

880

# will not be uploaded.

881

#

882

# The supported resource type is:

883

#

884

# Google Cloud Storage:

885

# storage.googleapis.com/{bucket}/{object}

886

# bucket.storage.googleapis.com/{object}

887

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

888

"workflowFileName": "A String", # The file to store the workflow in.

889

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

890

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

891

# temporary storage.

892

#

893

# The supported resource type is:

894

#

895

# Google Cloud Storage:

896

# storage.googleapis.com/{bucket}/{object}

897

# bucket.storage.googleapis.com/{object}

898

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

899

"languageHint": "A String", # The suggested backend language.

900

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

901

#

902

# When workers access Google Cloud APIs, they logically do so via

903

# relative URLs. If this field is specified, it supplies the base

904

# URL to use for resolving these relative URLs. The normative

905

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

906

# Locators".

907

#

908

# If not specified, the default value is "http://www.googleapis.com/"

909

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

910

# console.

911

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

912

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

913

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

914

#

915

# When workers access Google Cloud APIs, they logically do so via

916

# relative URLs. If this field is specified, it supplies the base

917

# URL to use for resolving these relative URLs. The normative

918

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

919

# Locators".

920

#

921

# If not specified, the default value is "http://www.googleapis.com/"

922

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

923

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

924

# "dataflow/v1b3/projects".

925

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

926

# "shuffle/v1beta1".

927

"workerId": "A String", # The ID of the worker running this pipeline.

928

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

929

# storage.

930

#

931

# The supported resource type is:

932

#

933

# Google Cloud Storage:

934

#

935

# storage.googleapis.com/{bucket}/{object}

936

# bucket.storage.googleapis.com/{object}

937

},

938

"vmId": "A String", # The ID string of the VM.

939

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

940

# taskrunner; e.g. "root".

941

},

942

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

943

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

944

"algorithm": "A String", # The algorithm to use for autoscaling.

945

},

946

"metadata": { # Metadata to set on the Google Compute Engine VMs.

947

"a_key": "A String",

948

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

949

},

950

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

951

"dataset": "A String", # The dataset for the current project where various workflow

952

# related tables are stored.

953

#

954

# The supported resource type is:

955

#

956

# Google BigQuery:

957

# bigquery.googleapis.com/{dataset}

958

"internalExperiments": { # Experimental settings.

959

"a_key": "", # Properties of the object. Contains field @type with type URL.

960

},

961

"workerRegion": "A String", # The Compute Engine region

962

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

963

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

964

# with worker_zone. If neither worker_region nor worker_zone is specified,

965

# default to the control plane's region.

966

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

967

# at rest, AKA a Customer Managed Encryption Key (CMEK).

968

#

969

# Format:

970

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

971

"userAgent": { # A description of the process that generated the request.

972

"a_key": "", # Properties of the object.

973

},

974

"workerZone": "A String", # The Compute Engine zone

975

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

976

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

977

# with worker_region. If neither worker_region nor worker_zone is specified,

978

# a zone in the control plane's region is chosen based on available capacity.

979

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

980

# unspecified, the service will attempt to choose a reasonable

981

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

982

# e.g. "compute.googleapis.com".

983

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

984

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

985

# this resource prefix, where {JOBNAME} is the value of the

986

# job_name field. The resulting bucket and object prefix is used

987

# as the prefix of the resources used to store temporary data

988

# needed during the job execution. NOTE: This will override the

989

# value in taskrunner_settings.

990

# The supported resource type is:

991

#

992

# Google Cloud Storage:

993

#

994

# storage.googleapis.com/{bucket}/{object}

995

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

996

"experiments": [ # The list of experiments to enable.

997

"A String",

998

],

999

"version": { # A structure describing which components and their versions of the service

1000

# are required in order to run the job.

1001

"a_key": "", # Properties of the object.

1002

},

1003

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1004

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1005

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1006

# callers cannot mutate it.

1007

{ # A message describing the state of a particular execution stage.

1008

"executionStageName": "A String", # The name of the execution stage.

1009

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1010

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1011

},

1012

],

1013

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1014

# by the metadata values provided here. Populated for ListJobs and all GetJob

1015

# views SUMMARY and higher.

1016

# ListJob response and Job SUMMARY view.

1017

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1018

{ # Metadata for a BigTable connector used by the job.

1019

"tableId": "A String", # TableId accessed in the connection.

1020

"projectId": "A String", # ProjectId accessed in the connection.

1021

"instanceId": "A String", # InstanceId accessed in the connection.

1022

},

1023

],

1024

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1025

{ # Metadata for a Spanner connector used by the job.

1026

"databaseId": "A String", # DatabaseId accessed in the connection.

1027

"instanceId": "A String", # InstanceId accessed in the connection.

1028

"projectId": "A String", # ProjectId accessed in the connection.

1029

},

1030

],

1031

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1032

{ # Metadata for a Datastore connector used by the job.

1033

"projectId": "A String", # ProjectId accessed in the connection.

1034

"namespace": "A String", # Namespace used in the connection.

1035

},

1036

],

1037

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1038

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1039

"sdkSupportStatus": "A String", # The support status for this SDK version.

1040

"version": "A String", # The version of the SDK used to run the job.

1041

},

1042

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1043

{ # Metadata for a BigQuery connector used by the job.

1044

"table": "A String", # Table accessed in the connection.

1045

"dataset": "A String", # Dataset accessed in the connection.

1046

"projectId": "A String", # Project accessed in the connection.

1047

"query": "A String", # Query used to access data in the connection.

1048

},

1049

],

1050

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1051

{ # Metadata for a File connector used by the job.

1052

"filePattern": "A String", # File Pattern used to access files by the connector.

1053

},

1054

],

1055

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1056

{ # Metadata for a PubSub connector used by the job.

1057

"subscription": "A String", # Subscription used in the connection.

1058

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1063

# snapshot.

1064

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1065

"type": "A String", # The type of Cloud Dataflow job.

1066

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1067

# A description of the user pipeline and stages through which it is executed.

1068

# Created by Cloud Dataflow service. Only retrieved with

1069

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1070

# form. This data is provided by the Dataflow service for ease of visualizing

1071

# the pipeline and interpreting Dataflow provided metrics.

1072

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1073

{ # Description of the composing transforms, names/ids, and input/outputs of a

1074

# stage of execution. Some composing transforms and sources may have been

1075

# generated by the Dataflow service during execution planning.

1076

"id": "A String", # Dataflow service generated id for this stage.

1077

"componentTransform": [ # Transforms that comprise this execution stage.

1078

{ # Description of a transform executed as part of an execution stage.

1079

"originalTransform": "A String", # User name for the original user transform with which this transform is

1080

# most closely associated.

1081

"name": "A String", # Dataflow service generated name for this source.

1082

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1083

},

1084

],

1085

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1086

{ # Description of an interstitial value between transforms in an execution

1087

# stage.

1088

"name": "A String", # Dataflow service generated name for this source.

1089

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1090

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1091

# source is most closely associated.

1092

},

1093

],

1094

"kind": "A String", # Type of tranform this stage is executing.

1095

"outputSource": [ # Output sources for this stage.

1096

{ # Description of an input or output of an execution stage.

1097

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1098

# source is most closely associated.

1099

"name": "A String", # Dataflow service generated name for this source.

1100

"sizeBytes": "A String", # Size of the source, if measurable.

1101

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1102

},

1103

],

1104

"name": "A String", # Dataflow service generated name for this stage.

1105

"inputSource": [ # Input sources for this stage.

1106

{ # Description of an input or output of an execution stage.

1107

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1108

# source is most closely associated.

1109

"name": "A String", # Dataflow service generated name for this source.

1110

"sizeBytes": "A String", # Size of the source, if measurable.

1111

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1117

{ # Description of the type, names/ids, and input/outputs for a transform.

1118

"kind": "A String", # Type of transform.

1119

"inputCollectionName": [ # User names for all collection inputs to this transform.

1120

"A String",

1121

],

1122

"name": "A String", # User provided name for this transform instance.

1123

"id": "A String", # SDK generated id of this transform instance.

1124

"displayData": [ # Transform-specific display data.

1125

{ # Data provided with a pipeline or transform to provide descriptive info.

1126

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1127

"boolValue": True or False, # Contains value if the data is of a boolean type.

1128

"javaClassValue": "A String", # Contains value if the data is of java class type.

1129

"strValue": "A String", # Contains value if the data is of string type.

1130

"int64Value": "A String", # Contains value if the data is of int64 type.

1131

"durationValue": "A String", # Contains value if the data is of duration type.

1132

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1133

# language namespace (i.e. python module) which defines the display data.

1134

# This allows a dax monitoring system to specially handle the data

1135

# and perform custom rendering.

1136

"floatValue": 3.14, # Contains value if the data is of float type.

1137

"key": "A String", # The key identifying the display data.

1138

# This is intended to be used as a label for the display data

1139

# when viewed in a dax monitoring system.

1140

"shortStrValue": "A String", # A possible additional shorter value to display.

1141

# For example a java_class_name_value of com.mypackage.MyDoFn

1142

# will be stored with MyDoFn as the short_str_value and

1143

# com.mypackage.MyDoFn as the java_class_name value.

1144

# short_str_value can be displayed and java_class_name_value

1145

# will be displayed as a tooltip.

1146

"url": "A String", # An optional full URL.

1147

"label": "A String", # An optional label to display in a dax UI for the element.

1148

},

1149

],

1150

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1156

{ # Data provided with a pipeline or transform to provide descriptive info.

1157

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1158

"boolValue": True or False, # Contains value if the data is of a boolean type.

1159

"javaClassValue": "A String", # Contains value if the data is of java class type.

1160

"strValue": "A String", # Contains value if the data is of string type.

1161

"int64Value": "A String", # Contains value if the data is of int64 type.

1162

"durationValue": "A String", # Contains value if the data is of duration type.

1163

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1164

# language namespace (i.e. python module) which defines the display data.

1165

# This allows a dax monitoring system to specially handle the data

1166

# and perform custom rendering.

1167

"floatValue": 3.14, # Contains value if the data is of float type.

1168

"key": "A String", # The key identifying the display data.

1169

# This is intended to be used as a label for the display data

1170

# when viewed in a dax monitoring system.

1171

"shortStrValue": "A String", # A possible additional shorter value to display.

1172

# For example a java_class_name_value of com.mypackage.MyDoFn

1173

# will be stored with MyDoFn as the short_str_value and

1174

# com.mypackage.MyDoFn as the java_class_name value.

1175

# short_str_value can be displayed and java_class_name_value

1176

# will be displayed as a tooltip.

1177

"url": "A String", # An optional full URL.

1178

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1183

# of the job it replaced.

1184

#

1185

# When sending a `CreateJobRequest`, you can update a job by specifying it

1186

# here. The job named here is stopped, and its intermediate state is

1187

# transferred to this job.

1188

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1189

# for temporary storage. These temporary files will be

1190

# removed on job completion.

1191

# No duplicates are allowed.

1192

# No file patterns are supported.

1193

#

1194

# The supported files are:

1195

#

1196

# Google Cloud Storage:

1197

#

1198

# storage.googleapis.com/{bucket}/{object}

1199

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1200

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1201

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1202

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1203

#

1204

# Only one Job with a given name may exist in a project at any

1205

# given time. If a caller attempts to create a Job with the same

1206

# name as an already-existing Job, the attempt returns the

1207

# existing Job.

1208

#

1209

# The name must match the regular expression

1210

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1211

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1212

#

1213

# The top-level steps that constitute the entire job.

1214

{ # Defines a particular step within a Cloud Dataflow job.

1215

#

1216

# A job consists of multiple steps, each of which performs some

1217

# specific operation as part of the overall job. Data is typically

1218

# passed from one step to another as part of the job.

1219

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1220

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1221

# Map-Reduce job:

1222

#

1223

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1224

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1225

#

1226

# * Validate the elements.

1227

#

1228

# * Apply a user-defined function to map each element to some value

1229

# and extract an element-specific key value.

1230

#

1231

# * Group elements with the same key into a single element with

1232

# that key, transforming a multiply-keyed collection into a

1233

# uniquely-keyed collection.

1234

#

1235

# * Write the elements out to some data sink.

1236

#

1237

# Note that the Cloud Dataflow service may be used to run many different

1238

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1239

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1240

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1241

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1242

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1243

# predefined step has its own required set of properties.

1244

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1245

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1246

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1247

},

1248

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1249

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1250

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1251

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1252

# isn't contained in the submitted job.

1253

"stages": { # A mapping from each stage to the information about that stage.

1254

"a_key": { # Contains information about how a particular

1255

# google.dataflow.v1beta3.Step will be executed.

1256

"stepName": [ # The steps associated with the execution stage.

1257

# Note that stages may have several steps, and that a given step

1258

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1265

#

1266

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1267

# specified.

1268

#

1269

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1270

# terminal state. After a job has reached a terminal state, no

1271

# further state updates may be made.

1272

#

1273

# This field may be mutated by the Cloud Dataflow service;

1274

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1275

"location": "A String", # The [regional endpoint]

1276

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1277

# contains this job.

1278

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1279

# Flexible resource scheduling jobs are started with some delay after job

1280

# creation, so start_time is unset before start and is updated when the

1281

# job is started by the Cloud Dataflow service. For other jobs, start_time

1282

# always equals to create_time and is immutable and set by the Cloud Dataflow

1283

# service.

1284

"stepsLocation": "A String", # The GCS location where the steps are stored.

1285

"labels": { # User-defined labels for this job.

1286

#

1287

# The labels map can contain no more than 64 entries. Entries of the labels

1288

# map are UTF8 strings that comply with the following restrictions:

1289

#

1290

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1291

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1292

# * Both keys and values are additionally constrained to be <= 128 bytes in

1293

# size.

1294

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1295

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1296

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1297

# Cloud Dataflow service.

1298

"requestedState": "A String", # The job's requested state.

1299

#

1300

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1301

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1302

# also be used to directly set a job's requested state to

1303

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1304

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1309

<code class="details" id="get">get(projectId, location, jobId, view=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1310

<pre>Gets the state of the specified Cloud Dataflow job.

1311

1312

To get the state of a job, we recommend using `projects.locations.jobs.get`

1313

with a [regional endpoint]

1314

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1315

`projects.jobs.get` is not recommended, as you can only get the state of

1316

jobs that are running in `us-central1`.

1317

1318

Args:

1319

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

1320

location: string, The [regional endpoint]

1321

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1322

contains this job. (required)

1323

jobId: string, The job ID. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1324

view: string, The level of information requested in response.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1325

x__xgafv: string, V1 error format.

1326

Allowed values

1327

1 - v1 error format

1328

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1329

1330

Returns:

1331

An object of the form:

1332

1333

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1334

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

1335

# If this field is set, the service will ensure its uniqueness.

1336

# The request to create a job will fail if the service has knowledge of a

1337

# previously submitted job with the same client's ID and job name.

1338

# The caller may use this field to ensure idempotence of job

1339

# creation across retried attempts to create a job.

1340

# By default, the field is empty and, in that case, the service ignores it.

1341

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1342

#

1343

# This field is set by the Cloud Dataflow service when the Job is

1344

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1345

"currentStateTime": "A String", # The timestamp associated with the current state.

1346

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1347

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1348

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1349

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1350

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

1351

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1352

# options are passed through the service and are used to recreate the

1353

# SDK pipeline options on the worker in a language agnostic and platform

1354

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1355

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1356

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1357

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

1358

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1359

# specified in order for the job to have workers.

1360

{ # Describes one particular pool of Cloud Dataflow workers to be

1361

# instantiated by the Cloud Dataflow service in order to perform the

1362

# computations required by a job. Note that a workflow job may use

1363

# multiple pools, in order to match the various computational

1364

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1365

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

1366

# select a default set of packages which are useful to worker

1367

# harnesses written in a particular language.

1368

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

1369

# the service will use the network "default".

1370

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1371

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1372

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

1373

# execute the job. If zero or unspecified, the service will

1374

# attempt to choose a reasonable default.

1375

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1376

# service will choose a number of threads (according to the number of cores

1377

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1378

"diskSourceImage": "A String", # Fully qualified source image for disks.

1379

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1380

{ # The packages that must be installed in order for a worker to run the

1381

# steps of the Cloud Dataflow job that will be assigned to its worker

1382

# pool.

1383

#

1384

# This is the mechanism by which the Cloud Dataflow SDK causes code to

1385

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1386

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1387

# various dependencies (libraries, data files, etc.) required in order

1388

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1389

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1390

#

1391

# Google Cloud Storage:

1392

#

1393

# storage.googleapis.com/{bucket}

1394

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1395

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1396

},

1397

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1398

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1399

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

1400

# `TEARDOWN_NEVER`.

1401

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

1402

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

1403

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

1404

# down.

1405

#

1406

# If the workers are not torn down by the service, they will

1407

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1408

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1409

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

1410

# policy except for small, manually supervised test jobs.

1411

#

1412

# If unknown or unspecified, the service will attempt to choose a reasonable

1413

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1414

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

1415

# Compute Engine API.

1416

"poolArgs": { # Extra arguments for this worker pool.

1417

"a_key": "", # Properties of the object. Contains field @type with type URL.

1418

},

1419

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1420

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1421

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

1422

# harness, residing in Google Container Registry.

1423

#

1424

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

1425

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1426

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1427

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

1428

# service will attempt to choose a reasonable default.

1429

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

1430

# are supported.

1431

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1432

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1433

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1434

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1435

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1436

# must be a disk type appropriate to the project and zone in which

1437

# the workers will run. If unknown or unspecified, the service

1438

# will attempt to choose a reasonable default.

1439

#

1440

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1441

# typically ending in "pd-standard". If SSD persistent disks are

1442

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1443

# actual valid values are defined the Google Compute Engine API,

1444

# not by the Cloud Dataflow API; consult the Google Compute Engine

1445

# documentation for more information about determining the set of

1446

# available disk types for a particular project and zone.

1447

#

1448

# Google Compute Engine Disk types are local to a particular

1449

# project in a particular zone, and so the resource name will

1450

# typically look something like this:

1451

#

1452

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1453

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1454

},

1455

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1456

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1457

# only be set in the Fn API path. For non-cross-language pipelines this

1458

# should have only one entry. Cross-language pipelines will have two or more

1459

# entries.

1460

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1461

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

1462

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1463

# container instance with this image. If false (or unset) recommends using

1464

# more than one core per SDK container instance with this image for

1465

# efficiency. Note that Dataflow service may choose to override this property

1466

# if needed.

1467

},

1468

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1469

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

1470

# the form "regions/REGION/subnetworks/SUBNETWORK".

1471

"ipConfiguration": "A String", # Configuration for VM IPs.

1472

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

1473

# using the standard Dataflow task runner. Users should ignore

1474

# this field.

1475

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

1476

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

1477

# taskrunner; e.g. "wheel".

1478

"harnessCommand": "A String", # The command to launch the worker harness.

1479

"logDir": "A String", # The directory on the VM to store logs.

1480

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

1481

# access the Cloud Dataflow API.

1482

"A String",

1483

],

1484

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

1485

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

1486

# will not be uploaded.

1487

#

1488

# The supported resource type is:

1489

#

1490

# Google Cloud Storage:

1491

# storage.googleapis.com/{bucket}/{object}

1492

# bucket.storage.googleapis.com/{object}

1493

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

1494

"workflowFileName": "A String", # The file to store the workflow in.

1495

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

1496

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

1497

# temporary storage.

1498

#

1499

# The supported resource type is:

1500

#

1501

# Google Cloud Storage:

1502

# storage.googleapis.com/{bucket}/{object}

1503

# bucket.storage.googleapis.com/{object}

1504

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

1505

"languageHint": "A String", # The suggested backend language.

1506

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

1507

#

1508

# When workers access Google Cloud APIs, they logically do so via

1509

# relative URLs. If this field is specified, it supplies the base

1510

# URL to use for resolving these relative URLs. The normative

1511

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1512

# Locators".

1513

#

1514

# If not specified, the default value is "http://www.googleapis.com/"

1515

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

1516

# console.

1517

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

1518

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

1519

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

1520

#

1521

# When workers access Google Cloud APIs, they logically do so via

1522

# relative URLs. If this field is specified, it supplies the base

1523

# URL to use for resolving these relative URLs. The normative

1524

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

1525

# Locators".

1526

#

1527

# If not specified, the default value is "http://www.googleapis.com/"

1528

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

1529

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

1530

# "dataflow/v1b3/projects".

1531

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

1532

# "shuffle/v1beta1".

1533

"workerId": "A String", # The ID of the worker running this pipeline.

1534

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1535

# storage.

1536

#

1537

# The supported resource type is:

1538

#

1539

# Google Cloud Storage:

1540

#

1541

# storage.googleapis.com/{bucket}/{object}

1542

# bucket.storage.googleapis.com/{object}

1543

},

1544

"vmId": "A String", # The ID string of the VM.

1545

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

1546

# taskrunner; e.g. "root".

1547

},

1548

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

1549

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

1550

"algorithm": "A String", # The algorithm to use for autoscaling.

1551

},

1552

"metadata": { # Metadata to set on the Google Compute Engine VMs.

1553

"a_key": "A String",

1554

},

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1555

},

1556

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1557

"dataset": "A String", # The dataset for the current project where various workflow

1558

# related tables are stored.

1559

#

1560

# The supported resource type is:

1561

#

1562

# Google BigQuery:

1563

# bigquery.googleapis.com/{dataset}

1564

"internalExperiments": { # Experimental settings.

1565

"a_key": "", # Properties of the object. Contains field @type with type URL.

1566

},

1567

"workerRegion": "A String", # The Compute Engine region

1568

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1569

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

1570

# with worker_zone. If neither worker_region nor worker_zone is specified,

1571

# default to the control plane's region.

1572

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

1573

# at rest, AKA a Customer Managed Encryption Key (CMEK).

1574

#

1575

# Format:

1576

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

1577

"userAgent": { # A description of the process that generated the request.

1578

"a_key": "", # Properties of the object.

1579

},

1580

"workerZone": "A String", # The Compute Engine zone

1581

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

1582

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

1583

# with worker_region. If neither worker_region nor worker_zone is specified,

1584

# a zone in the control plane's region is chosen based on available capacity.

1585

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1586

# unspecified, the service will attempt to choose a reasonable

1587

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1588

# e.g. "compute.googleapis.com".

1589

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

1590

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1591

# this resource prefix, where {JOBNAME} is the value of the

1592

# job_name field. The resulting bucket and object prefix is used

1593

# as the prefix of the resources used to store temporary data

1594

# needed during the job execution. NOTE: This will override the

1595

# value in taskrunner_settings.

1596

# The supported resource type is:

1597

#

1598

# Google Cloud Storage:

1599

#

1600

# storage.googleapis.com/{bucket}/{object}

1601

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1602

"experiments": [ # The list of experiments to enable.

1603

"A String",

1604

],

1605

"version": { # A structure describing which components and their versions of the service

1606

# are required in order to run the job.

1607

"a_key": "", # Properties of the object.

1608

},

1609

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1610

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1611

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

1612

# callers cannot mutate it.

1613

{ # A message describing the state of a particular execution stage.

1614

"executionStageName": "A String", # The name of the execution stage.

1615

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

1616

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

1617

},

1618

],

1619

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

1620

# by the metadata values provided here. Populated for ListJobs and all GetJob

1621

# views SUMMARY and higher.

1622

# ListJob response and Job SUMMARY view.

1623

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

1624

{ # Metadata for a BigTable connector used by the job.

1625

"tableId": "A String", # TableId accessed in the connection.

1626

"projectId": "A String", # ProjectId accessed in the connection.

1627

"instanceId": "A String", # InstanceId accessed in the connection.

1628

},

1629

],

1630

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

1631

{ # Metadata for a Spanner connector used by the job.

1632

"databaseId": "A String", # DatabaseId accessed in the connection.

1633

"instanceId": "A String", # InstanceId accessed in the connection.

1634

"projectId": "A String", # ProjectId accessed in the connection.

1635

},

1636

],

1637

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

1638

{ # Metadata for a Datastore connector used by the job.

1639

"projectId": "A String", # ProjectId accessed in the connection.

1640

"namespace": "A String", # Namespace used in the connection.

1641

},

1642

],

1643

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

1644

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

1645

"sdkSupportStatus": "A String", # The support status for this SDK version.

1646

"version": "A String", # The version of the SDK used to run the job.

1647

},

1648

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

1649

{ # Metadata for a BigQuery connector used by the job.

1650

"table": "A String", # Table accessed in the connection.

1651

"dataset": "A String", # Dataset accessed in the connection.

1652

"projectId": "A String", # Project accessed in the connection.

1653

"query": "A String", # Query used to access data in the connection.

1654

},

1655

],

1656

"fileDetails": [ # Identification of a File source used in the Dataflow job.

1657

{ # Metadata for a File connector used by the job.

1658

"filePattern": "A String", # File Pattern used to access files by the connector.

1659

},

1660

],

1661

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

1662

{ # Metadata for a PubSub connector used by the job.

1663

"subscription": "A String", # Subscription used in the connection.

1664

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

1669

# snapshot.

1670

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

1671

"type": "A String", # The type of Cloud Dataflow job.

1672

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

1673

# A description of the user pipeline and stages through which it is executed.

1674

# Created by Cloud Dataflow service. Only retrieved with

1675

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

1676

# form. This data is provided by the Dataflow service for ease of visualizing

1677

# the pipeline and interpreting Dataflow provided metrics.

1678

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

1679

{ # Description of the composing transforms, names/ids, and input/outputs of a

1680

# stage of execution. Some composing transforms and sources may have been

1681

# generated by the Dataflow service during execution planning.

1682

"id": "A String", # Dataflow service generated id for this stage.

1683

"componentTransform": [ # Transforms that comprise this execution stage.

1684

{ # Description of a transform executed as part of an execution stage.

1685

"originalTransform": "A String", # User name for the original user transform with which this transform is

1686

# most closely associated.

1687

"name": "A String", # Dataflow service generated name for this source.

1688

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1689

},

1690

],

1691

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

1692

{ # Description of an interstitial value between transforms in an execution

1693

# stage.

1694

"name": "A String", # Dataflow service generated name for this source.

1695

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

1696

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1697

# source is most closely associated.

1698

},

1699

],

1700

"kind": "A String", # Type of tranform this stage is executing.

1701

"outputSource": [ # Output sources for this stage.

1702

{ # Description of an input or output of an execution stage.

1703

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1704

# source is most closely associated.

1705

"name": "A String", # Dataflow service generated name for this source.

1706

"sizeBytes": "A String", # Size of the source, if measurable.

1707

"userName": "A String", # Human-readable name for this source; may be user or system generated.

1708

},

1709

],

1710

"name": "A String", # Dataflow service generated name for this stage.

1711

"inputSource": [ # Input sources for this stage.

1712

{ # Description of an input or output of an execution stage.

1713

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

1714

# source is most closely associated.

1715

"name": "A String", # Dataflow service generated name for this source.

1716

"sizeBytes": "A String", # Size of the source, if measurable.

1717

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

1723

{ # Description of the type, names/ids, and input/outputs for a transform.

1724

"kind": "A String", # Type of transform.

1725

"inputCollectionName": [ # User names for all collection inputs to this transform.

1726

"A String",

1727

],

1728

"name": "A String", # User provided name for this transform instance.

1729

"id": "A String", # SDK generated id of this transform instance.

1730

"displayData": [ # Transform-specific display data.

1731

{ # Data provided with a pipeline or transform to provide descriptive info.

1732

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1733

"boolValue": True or False, # Contains value if the data is of a boolean type.

1734

"javaClassValue": "A String", # Contains value if the data is of java class type.

1735

"strValue": "A String", # Contains value if the data is of string type.

1736

"int64Value": "A String", # Contains value if the data is of int64 type.

1737

"durationValue": "A String", # Contains value if the data is of duration type.

1738

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1739

# language namespace (i.e. python module) which defines the display data.

1740

# This allows a dax monitoring system to specially handle the data

1741

# and perform custom rendering.

1742

"floatValue": 3.14, # Contains value if the data is of float type.

1743

"key": "A String", # The key identifying the display data.

1744

# This is intended to be used as a label for the display data

1745

# when viewed in a dax monitoring system.

1746

"shortStrValue": "A String", # A possible additional shorter value to display.

1747

# For example a java_class_name_value of com.mypackage.MyDoFn

1748

# will be stored with MyDoFn as the short_str_value and

1749

# com.mypackage.MyDoFn as the java_class_name value.

1750

# short_str_value can be displayed and java_class_name_value

1751

# will be displayed as a tooltip.

1752

"url": "A String", # An optional full URL.

1753

"label": "A String", # An optional label to display in a dax UI for the element.

1754

},

1755

],

1756

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

1762

{ # Data provided with a pipeline or transform to provide descriptive info.

1763

"timestampValue": "A String", # Contains value if the data is of timestamp type.

1764

"boolValue": True or False, # Contains value if the data is of a boolean type.

1765

"javaClassValue": "A String", # Contains value if the data is of java class type.

1766

"strValue": "A String", # Contains value if the data is of string type.

1767

"int64Value": "A String", # Contains value if the data is of int64 type.

1768

"durationValue": "A String", # Contains value if the data is of duration type.

1769

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

1770

# language namespace (i.e. python module) which defines the display data.

1771

# This allows a dax monitoring system to specially handle the data

1772

# and perform custom rendering.

1773

"floatValue": 3.14, # Contains value if the data is of float type.

1774

"key": "A String", # The key identifying the display data.

1775

# This is intended to be used as a label for the display data

1776

# when viewed in a dax monitoring system.

1777

"shortStrValue": "A String", # A possible additional shorter value to display.

1778

# For example a java_class_name_value of com.mypackage.MyDoFn

1779

# will be stored with MyDoFn as the short_str_value and

1780

# com.mypackage.MyDoFn as the java_class_name value.

1781

# short_str_value can be displayed and java_class_name_value

1782

# will be displayed as a tooltip.

1783

"url": "A String", # An optional full URL.

1784

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

1789

# of the job it replaced.

1790

#

1791

# When sending a `CreateJobRequest`, you can update a job by specifying it

1792

# here. The job named here is stopped, and its intermediate state is

1793

# transferred to this job.

1794

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1795

# for temporary storage. These temporary files will be

1796

# removed on job completion.

1797

# No duplicates are allowed.

1798

# No file patterns are supported.

1799

#

1800

# The supported files are:

1801

#

1802

# Google Cloud Storage:

1803

#

1804

# storage.googleapis.com/{bucket}/{object}

1805

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1806

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1807

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1808

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1809

#

1810

# Only one Job with a given name may exist in a project at any

1811

# given time. If a caller attempts to create a Job with the same

1812

# name as an already-existing Job, the attempt returns the

1813

# existing Job.

1814

#

1815

# The name must match the regular expression

1816

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1817

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1818

#

1819

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1820

{ # Defines a particular step within a Cloud Dataflow job.

1821

#

1822

# A job consists of multiple steps, each of which performs some

1823

# specific operation as part of the overall job. Data is typically

1824

# passed from one step to another as part of the job.

1825

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1826

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1827

# Map-Reduce job:

1828

#

1829

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1830

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1831

#

1832

# * Validate the elements.

1833

#

1834

# * Apply a user-defined function to map each element to some value

1835

# and extract an element-specific key value.

1836

#

1837

# * Group elements with the same key into a single element with

1838

# that key, transforming a multiply-keyed collection into a

1839

# uniquely-keyed collection.

1840

#

1841

# * Write the elements out to some data sink.

1842

#

1843

# Note that the Cloud Dataflow service may be used to run many different

1844

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1845

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

1846

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1847

"kind": "A String", # The kind of step in the Cloud Dataflow job.

1848

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1849

# predefined step has its own required set of properties.

1850

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1851

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1852

},

1853

},

1854

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1855

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

1856

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

1857

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

1858

# isn't contained in the submitted job.

1859

"stages": { # A mapping from each stage to the information about that stage.

1860

"a_key": { # Contains information about how a particular

1861

# google.dataflow.v1beta3.Step will be executed.

1862

"stepName": [ # The steps associated with the execution stage.

1863

# Note that stages may have several steps, and that a given step

1864

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1871

#

1872

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

1873

# specified.

1874

#

1875

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

1876

# terminal state. After a job has reached a terminal state, no

1877

# further state updates may be made.

1878

#

1879

# This field may be mutated by the Cloud Dataflow service;

1880

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1881

"location": "A String", # The [regional endpoint]

1882

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1883

# contains this job.

1884

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

1885

# Flexible resource scheduling jobs are started with some delay after job

1886

# creation, so start_time is unset before start and is updated when the

1887

# job is started by the Cloud Dataflow service. For other jobs, start_time

1888

# always equals to create_time and is immutable and set by the Cloud Dataflow

1889

# service.

1890

"stepsLocation": "A String", # The GCS location where the steps are stored.

1891

"labels": { # User-defined labels for this job.

1892

#

1893

# The labels map can contain no more than 64 entries. Entries of the labels

1894

# map are UTF8 strings that comply with the following restrictions:

1895

#

1896

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

1897

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

1898

# * Both keys and values are additionally constrained to be <= 128 bytes in

1899

# size.

1900

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1901

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1902

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

1903

# Cloud Dataflow service.

1904

"requestedState": "A String", # The job's requested state.

1905

#

1906

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

1907

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

1908

# also be used to directly set a job's requested state to

1909

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

1910

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1911

}</pre>

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

</div>

<code class="details" id="getMetrics">getMetrics(projectId, location, jobId, startTime=None, x__xgafv=None)</code>

1916

<pre>Request the job status.

1917

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1918

To request the status of a job, we recommend using

1919

`projects.locations.jobs.getMetrics` with a [regional endpoint]

1920

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

1921

`projects.jobs.getMetrics` is not recommended, as you can only request the

1922

status of jobs that are running in `us-central1`.

1923

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1924

Args:

1925

projectId: string, A project id. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1926

location: string, The [regional endpoint]

1927

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

1928

contains the job specified by job_id. (required)

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1929

jobId: string, The job to get messages for. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1930

startTime: string, Return only metric data that has changed since this time.

1931

Default is to return all information about all metrics for the job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1932

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1933

Allowed values

1934

1 - v1 error format

1935

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1936

1937

Returns:

1938

An object of the form:

1939

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1940

{ # JobMetrics contains a collection of metrics describing the detailed progress

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

1941

# of a Dataflow job. Metrics correspond to user-defined and system-defined

1942

# metrics in the job.

1943

#

1944

# This resource captures only the most recent values of each metric;

1945

# time-series data can be queried for them (under the same metric names)

1946

# from Cloud Monitoring.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1947

"metricTime": "A String", # Timestamp as of which metric values are current.

1948

"metrics": [ # All metrics for this job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1949

{ # Describes the state of a metric.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1950

"set": "", # Worker-computed aggregate value for the "Set" aggregation kind. The only

1951

# possible value type is a list of Values whose type can be Long, Double,

1952

# or String, according to the metric's type. All Values in the list must

1953

# be of the same type.

1954

"gauge": "", # A struct value describing properties of a Gauge.

1955

# Metrics of gauge type show the value of a metric across time, and is

1956

# aggregated based on the newest value.

1957

"cumulative": True or False, # True if this metric is reported as the total cumulative aggregate

1958

# value accumulated since the worker started working on this WorkItem.

1959

# By default this is false, indicating that this metric is reported

1960

# as a delta that is not associated with any WorkItem.

1961

"internal": "", # Worker-computed aggregate value for internal use by the Dataflow

1962

# service.

1963

"kind": "A String", # Metric aggregation kind. The possible metric aggregation kinds are

1964

# "Sum", "Max", "Min", "Mean", "Set", "And", "Or", and "Distribution".

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1965

# The specified aggregation kind is case-insensitive.

1966

#

1967

# If omitted, this is not an aggregated value but instead

1968

# a single metric sample value.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1969

"scalar": "", # Worker-computed aggregate value for aggregation kinds "Sum", "Max", "Min",

1970

# "And", and "Or". The possible value types are Long, Double, and Boolean.

1971

"meanCount": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

1972

# This holds the count of the aggregated values and is used in combination

1973

# with mean_sum above to obtain the actual mean aggregate value.

1974

# The only possible value type is Long.

1975

"meanSum": "", # Worker-computed aggregate value for the "Mean" aggregation kind.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

1976

# This holds the sum of the aggregated values and is used in combination

1977

# with mean_count below to obtain the actual mean aggregate value.

1978

# The only possible value types are Long and Double.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1979

"updateTime": "A String", # Timestamp associated with the metric value. Optional when workers are

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1980

# reporting work progress; it will be filled in responses from the

1981

# metrics API.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

1982

"name": { # Identifies a metric, by describing the source which generated the # Name of the metric.

1983

# metric.

1984

"context": { # Zero or more labeled fields which identify the part of the job this

1985

# metric is associated with, such as the name of a step or collection.

1986

#

1987

# For example, built-in counters associated with steps will have

1988

# context['step'] = <step-name>. Counters associated with PCollections

1989

# in the SDK will have context['pcollection'] = <pcollection-name>.

1990

"a_key": "A String",

1991

},

1992

"origin": "A String", # Origin (namespace) of metric name. May be blank for user-define metrics;

1993

# will be "dataflow" for metrics defined by the Dataflow service or SDK.

1994

"name": "A String", # Worker-defined metric name.

1995

},

1996

"distribution": "", # A struct value describing properties of a distribution of numeric values.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

1997

},

1998

],

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2003

<code class="details" id="list">list(projectId, location, filter=None, pageToken=None, pageSize=None, view=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2004

<pre>List the jobs of a project.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2005

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2006

To list the jobs of a project in a region, we recommend using

2007

`projects.locations.jobs.get` with a [regional endpoint]

2008

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). To

2009

list the all jobs across all regions, use `projects.jobs.aggregated`. Using

2010

`projects.jobs.list` is not recommended, as you can only get the list of

2011

jobs that are running in `us-central1`.

2012

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2013

Args:

2014

projectId: string, The project which owns the jobs. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2015

location: string, The [regional endpoint]

2016

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2017

contains this job. (required)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2018

filter: string, The kind of filter to use.

2019

pageToken: string, Set this to the 'next_page_token' field of a previous response

2020

to request additional results in a long list.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2021

pageSize: integer, If there are many jobs, limit response to at most this many.

2022

The actual number of jobs returned will be the lesser of max_responses

2023

and an unspecified server-defined limit.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2024

view: string, Level of information requested in response. Default is `JOB_VIEW_SUMMARY`.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2025

x__xgafv: string, V1 error format.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2026

Allowed values

2027

1 - v1 error format

2028

2 - v2 error format

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2029

2030

Returns:

2031

An object of the form:

2032

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2033

{ # Response to a request to list Cloud Dataflow jobs in a project. This might

2034

# be a partial response, depending on the page size in the ListJobsRequest.

2035

# However, if the project does not have any jobs, an instance of

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2036

# ListJobsResponse is not returned and the requests's response

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2037

# body is empty {}.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2038

"nextPageToken": "A String", # Set if there may be more results than fit in this response.

2039

"failedLocation": [ # Zero or more messages describing the [regional endpoints]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2040

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2041

# failed to respond.

2042

{ # Indicates which [regional endpoint]

2043

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) failed

2044

# to respond to a request for data.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2045

"name": "A String", # The name of the [regional endpoint]

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2046

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2047

# failed to respond.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2048

},

2049

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2050

"jobs": [ # A subset of the requested job information.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2051

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2052

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2053

# If this field is set, the service will ensure its uniqueness.

2054

# The request to create a job will fail if the service has knowledge of a

2055

# previously submitted job with the same client's ID and job name.

2056

# The caller may use this field to ensure idempotence of job

2057

# creation across retried attempts to create a job.

2058

# By default, the field is empty and, in that case, the service ignores it.

2059

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2060

#

2061

# This field is set by the Cloud Dataflow service when the Job is

2062

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2063

"currentStateTime": "A String", # The timestamp associated with the current state.

2064

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2065

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2066

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2067

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2068

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

2069

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2070

# options are passed through the service and are used to recreate the

2071

# SDK pipeline options on the worker in a language agnostic and platform

2072

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2073

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2074

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2075

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2076

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2077

# specified in order for the job to have workers.

2078

{ # Describes one particular pool of Cloud Dataflow workers to be

2079

# instantiated by the Cloud Dataflow service in order to perform the

2080

# computations required by a job. Note that a workflow job may use

2081

# multiple pools, in order to match the various computational

2082

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2083

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2084

# select a default set of packages which are useful to worker

2085

# harnesses written in a particular language.

2086

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2087

# the service will use the network "default".

2088

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2089

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2090

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2091

# execute the job. If zero or unspecified, the service will

2092

# attempt to choose a reasonable default.

2093

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2094

# service will choose a number of threads (according to the number of cores

2095

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2096

"diskSourceImage": "A String", # Fully qualified source image for disks.

2097

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2098

{ # The packages that must be installed in order for a worker to run the

2099

# steps of the Cloud Dataflow job that will be assigned to its worker

2100

# pool.

2101

#

2102

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2103

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2104

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2105

# various dependencies (libraries, data files, etc.) required in order

2106

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2107

"location": "A String", # The resource to read the package from. The supported resource type is:

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2108

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2109

# Google Cloud Storage:

2110

#

2111

# storage.googleapis.com/{bucket}

2112

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2113

"name": "A String", # The name of the package.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2114

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2115

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2116

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2117

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2118

# `TEARDOWN_NEVER`.

2119

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2120

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2121

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2122

# down.

2123

#

2124

# If the workers are not torn down by the service, they will

2125

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2126

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2127

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2128

# policy except for small, manually supervised test jobs.

2129

#

2130

# If unknown or unspecified, the service will attempt to choose a reasonable

2131

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2132

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2133

# Compute Engine API.

2134

"poolArgs": { # Extra arguments for this worker pool.

2135

"a_key": "", # Properties of the object. Contains field @type with type URL.

2136

},

2137

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2138

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2139

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2140

# harness, residing in Google Container Registry.

2141

#

2142

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2143

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2144

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2145

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2146

# service will attempt to choose a reasonable default.

2147

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2148

# are supported.

2149

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2150

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2151

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2152

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2153

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2154

# must be a disk type appropriate to the project and zone in which

2155

# the workers will run. If unknown or unspecified, the service

2156

# will attempt to choose a reasonable default.

2157

#

2158

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2159

# typically ending in "pd-standard". If SSD persistent disks are

2160

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2161

# actual valid values are defined the Google Compute Engine API,

2162

# not by the Cloud Dataflow API; consult the Google Compute Engine

2163

# documentation for more information about determining the set of

2164

# available disk types for a particular project and zone.

2165

#

2166

# Google Compute Engine Disk types are local to a particular

2167

# project in a particular zone, and so the resource name will

2168

# typically look something like this:

2169

#

2170

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2171

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

2172

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2173

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2174

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2175

# only be set in the Fn API path. For non-cross-language pipelines this

2176

# should have only one entry. Cross-language pipelines will have two or more

2177

# entries.

2178

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2179

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2180

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2181

# container instance with this image. If false (or unset) recommends using

2182

# more than one core per SDK container instance with this image for

2183

# efficiency. Note that Dataflow service may choose to override this property

2184

# if needed.

2185

},

2186

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2187

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2188

# the form "regions/REGION/subnetworks/SUBNETWORK".

2189

"ipConfiguration": "A String", # Configuration for VM IPs.

2190

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2191

# using the standard Dataflow task runner. Users should ignore

2192

# this field.

2193

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2194

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2195

# taskrunner; e.g. "wheel".

2196

"harnessCommand": "A String", # The command to launch the worker harness.

2197

"logDir": "A String", # The directory on the VM to store logs.

2198

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2199

# access the Cloud Dataflow API.

2200

"A String",

2201

],

2202

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2203

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2204

# will not be uploaded.

2205

#

2206

# The supported resource type is:

2207

#

2208

# Google Cloud Storage:

2209

# storage.googleapis.com/{bucket}/{object}

2210

# bucket.storage.googleapis.com/{object}

2211

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2212

"workflowFileName": "A String", # The file to store the workflow in.

2213

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2214

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2215

# temporary storage.

2216

#

2217

# The supported resource type is:

2218

#

2219

# Google Cloud Storage:

2220

# storage.googleapis.com/{bucket}/{object}

2221

# bucket.storage.googleapis.com/{object}

2222

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2223

"languageHint": "A String", # The suggested backend language.

2224

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2225

#

2226

# When workers access Google Cloud APIs, they logically do so via

2227

# relative URLs. If this field is specified, it supplies the base

2228

# URL to use for resolving these relative URLs. The normative

2229

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2230

# Locators".

2231

#

2232

# If not specified, the default value is "http://www.googleapis.com/"

2233

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2234

# console.

2235

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2236

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2237

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2238

#

2239

# When workers access Google Cloud APIs, they logically do so via

2240

# relative URLs. If this field is specified, it supplies the base

2241

# URL to use for resolving these relative URLs. The normative

2242

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2243

# Locators".

2244

#

2245

# If not specified, the default value is "http://www.googleapis.com/"

2246

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2247

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2248

# "dataflow/v1b3/projects".

2249

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2250

# "shuffle/v1beta1".

2251

"workerId": "A String", # The ID of the worker running this pipeline.

2252

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2253

# storage.

2254

#

2255

# The supported resource type is:

2256

#

2257

# Google Cloud Storage:

2258

#

2259

# storage.googleapis.com/{bucket}/{object}

2260

# bucket.storage.googleapis.com/{object}

2261

},

2262

"vmId": "A String", # The ID string of the VM.

2263

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2264

# taskrunner; e.g. "root".

2265

},

2266

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2267

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2268

"algorithm": "A String", # The algorithm to use for autoscaling.

2269

},

2270

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2271

"a_key": "A String",

2272

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2273

},

2274

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2275

"dataset": "A String", # The dataset for the current project where various workflow

2276

# related tables are stored.

2277

#

2278

# The supported resource type is:

2279

#

2280

# Google BigQuery:

2281

# bigquery.googleapis.com/{dataset}

2282

"internalExperiments": { # Experimental settings.

2283

"a_key": "", # Properties of the object. Contains field @type with type URL.

2284

},

2285

"workerRegion": "A String", # The Compute Engine region

2286

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2287

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2288

# with worker_zone. If neither worker_region nor worker_zone is specified,

2289

# default to the control plane's region.

2290

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2291

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2292

#

2293

# Format:

2294

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2295

"userAgent": { # A description of the process that generated the request.

2296

"a_key": "", # Properties of the object.

2297

},

2298

"workerZone": "A String", # The Compute Engine zone

2299

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2300

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2301

# with worker_region. If neither worker_region nor worker_zone is specified,

2302

# a zone in the control plane's region is chosen based on available capacity.

2303

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2304

# unspecified, the service will attempt to choose a reasonable

2305

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2306

# e.g. "compute.googleapis.com".

2307

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2308

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2309

# this resource prefix, where {JOBNAME} is the value of the

2310

# job_name field. The resulting bucket and object prefix is used

2311

# as the prefix of the resources used to store temporary data

2312

# needed during the job execution. NOTE: This will override the

2313

# value in taskrunner_settings.

2314

# The supported resource type is:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2315

#

2316

# Google Cloud Storage:

2317

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2318

# storage.googleapis.com/{bucket}/{object}

2319

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2320

"experiments": [ # The list of experiments to enable.

2321

"A String",

2322

],

2323

"version": { # A structure describing which components and their versions of the service

2324

# are required in order to run the job.

2325

"a_key": "", # Properties of the object.

2326

},

2327

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2328

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2329

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2330

# callers cannot mutate it.

2331

{ # A message describing the state of a particular execution stage.

2332

"executionStageName": "A String", # The name of the execution stage.

2333

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2334

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2335

},

2336

],

2337

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

2338

# by the metadata values provided here. Populated for ListJobs and all GetJob

2339

# views SUMMARY and higher.

2340

# ListJob response and Job SUMMARY view.

2341

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

2342

{ # Metadata for a BigTable connector used by the job.

2343

"tableId": "A String", # TableId accessed in the connection.

2344

"projectId": "A String", # ProjectId accessed in the connection.

2345

"instanceId": "A String", # InstanceId accessed in the connection.

2346

},

2347

],

2348

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

2349

{ # Metadata for a Spanner connector used by the job.

2350

"databaseId": "A String", # DatabaseId accessed in the connection.

2351

"instanceId": "A String", # InstanceId accessed in the connection.

2352

"projectId": "A String", # ProjectId accessed in the connection.

2353

},

2354

],

2355

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

2356

{ # Metadata for a Datastore connector used by the job.

2357

"projectId": "A String", # ProjectId accessed in the connection.

2358

"namespace": "A String", # Namespace used in the connection.

2359

},

2360

],

2361

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

2362

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

2363

"sdkSupportStatus": "A String", # The support status for this SDK version.

2364

"version": "A String", # The version of the SDK used to run the job.

2365

},

2366

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

2367

{ # Metadata for a BigQuery connector used by the job.

2368

"table": "A String", # Table accessed in the connection.

2369

"dataset": "A String", # Dataset accessed in the connection.

2370

"projectId": "A String", # Project accessed in the connection.

2371

"query": "A String", # Query used to access data in the connection.

2372

},

2373

],

2374

"fileDetails": [ # Identification of a File source used in the Dataflow job.

2375

{ # Metadata for a File connector used by the job.

2376

"filePattern": "A String", # File Pattern used to access files by the connector.

2377

},

2378

],

2379

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

2380

{ # Metadata for a PubSub connector used by the job.

2381

"subscription": "A String", # Subscription used in the connection.

2382

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

2387

# snapshot.

2388

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

2389

"type": "A String", # The type of Cloud Dataflow job.

2390

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

2391

# A description of the user pipeline and stages through which it is executed.

2392

# Created by Cloud Dataflow service. Only retrieved with

2393

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

2394

# form. This data is provided by the Dataflow service for ease of visualizing

2395

# the pipeline and interpreting Dataflow provided metrics.

2396

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

2397

{ # Description of the composing transforms, names/ids, and input/outputs of a

2398

# stage of execution. Some composing transforms and sources may have been

2399

# generated by the Dataflow service during execution planning.

2400

"id": "A String", # Dataflow service generated id for this stage.

2401

"componentTransform": [ # Transforms that comprise this execution stage.

2402

{ # Description of a transform executed as part of an execution stage.

2403

"originalTransform": "A String", # User name for the original user transform with which this transform is

2404

# most closely associated.

2405

"name": "A String", # Dataflow service generated name for this source.

2406

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2407

},

2408

],

2409

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

2410

{ # Description of an interstitial value between transforms in an execution

2411

# stage.

2412

"name": "A String", # Dataflow service generated name for this source.

2413

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

2414

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2415

# source is most closely associated.

2416

},

2417

],

2418

"kind": "A String", # Type of tranform this stage is executing.

2419

"outputSource": [ # Output sources for this stage.

2420

{ # Description of an input or output of an execution stage.

2421

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2422

# source is most closely associated.

2423

"name": "A String", # Dataflow service generated name for this source.

2424

"sizeBytes": "A String", # Size of the source, if measurable.

2425

"userName": "A String", # Human-readable name for this source; may be user or system generated.

2426

},

2427

],

2428

"name": "A String", # Dataflow service generated name for this stage.

2429

"inputSource": [ # Input sources for this stage.

2430

{ # Description of an input or output of an execution stage.

2431

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

2432

# source is most closely associated.

2433

"name": "A String", # Dataflow service generated name for this source.

2434

"sizeBytes": "A String", # Size of the source, if measurable.

2435

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

2441

{ # Description of the type, names/ids, and input/outputs for a transform.

2442

"kind": "A String", # Type of transform.

2443

"inputCollectionName": [ # User names for all collection inputs to this transform.

2444

"A String",

2445

],

2446

"name": "A String", # User provided name for this transform instance.

2447

"id": "A String", # SDK generated id of this transform instance.

2448

"displayData": [ # Transform-specific display data.

2449

{ # Data provided with a pipeline or transform to provide descriptive info.

2450

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2451

"boolValue": True or False, # Contains value if the data is of a boolean type.

2452

"javaClassValue": "A String", # Contains value if the data is of java class type.

2453

"strValue": "A String", # Contains value if the data is of string type.

2454

"int64Value": "A String", # Contains value if the data is of int64 type.

2455

"durationValue": "A String", # Contains value if the data is of duration type.

2456

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2457

# language namespace (i.e. python module) which defines the display data.

2458

# This allows a dax monitoring system to specially handle the data

2459

# and perform custom rendering.

2460

"floatValue": 3.14, # Contains value if the data is of float type.

2461

"key": "A String", # The key identifying the display data.

2462

# This is intended to be used as a label for the display data

2463

# when viewed in a dax monitoring system.

2464

"shortStrValue": "A String", # A possible additional shorter value to display.

2465

# For example a java_class_name_value of com.mypackage.MyDoFn

2466

# will be stored with MyDoFn as the short_str_value and

2467

# com.mypackage.MyDoFn as the java_class_name value.

2468

# short_str_value can be displayed and java_class_name_value

2469

# will be displayed as a tooltip.

2470

"url": "A String", # An optional full URL.

2471

"label": "A String", # An optional label to display in a dax UI for the element.

2472

},

2473

],

2474

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

2480

{ # Data provided with a pipeline or transform to provide descriptive info.

2481

"timestampValue": "A String", # Contains value if the data is of timestamp type.

2482

"boolValue": True or False, # Contains value if the data is of a boolean type.

2483

"javaClassValue": "A String", # Contains value if the data is of java class type.

2484

"strValue": "A String", # Contains value if the data is of string type.

2485

"int64Value": "A String", # Contains value if the data is of int64 type.

2486

"durationValue": "A String", # Contains value if the data is of duration type.

2487

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

2488

# language namespace (i.e. python module) which defines the display data.

2489

# This allows a dax monitoring system to specially handle the data

2490

# and perform custom rendering.

2491

"floatValue": 3.14, # Contains value if the data is of float type.

2492

"key": "A String", # The key identifying the display data.

2493

# This is intended to be used as a label for the display data

2494

# when viewed in a dax monitoring system.

2495

"shortStrValue": "A String", # A possible additional shorter value to display.

2496

# For example a java_class_name_value of com.mypackage.MyDoFn

2497

# will be stored with MyDoFn as the short_str_value and

2498

# com.mypackage.MyDoFn as the java_class_name value.

2499

# short_str_value can be displayed and java_class_name_value

2500

# will be displayed as a tooltip.

2501

"url": "A String", # An optional full URL.

2502

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

2507

# of the job it replaced.

2508

#

2509

# When sending a `CreateJobRequest`, you can update a job by specifying it

2510

# here. The job named here is stopped, and its intermediate state is

2511

# transferred to this job.

2512

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2513

# for temporary storage. These temporary files will be

2514

# removed on job completion.

2515

# No duplicates are allowed.

2516

# No file patterns are supported.

2517

#

2518

# The supported files are:

2519

#

2520

# Google Cloud Storage:

2521

#

2522

# storage.googleapis.com/{bucket}/{object}

2523

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2524

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2525

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2526

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2527

#

2528

# Only one Job with a given name may exist in a project at any

2529

# given time. If a caller attempts to create a Job with the same

2530

# name as an already-existing Job, the attempt returns the

2531

# existing Job.

2532

#

2533

# The name must match the regular expression

2534

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2535

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2536

#

2537

# The top-level steps that constitute the entire job.

2538

{ # Defines a particular step within a Cloud Dataflow job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2539

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2540

# A job consists of multiple steps, each of which performs some

2541

# specific operation as part of the overall job. Data is typically

2542

# passed from one step to another as part of the job.

2543

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2544

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2545

# Map-Reduce job:

2546

#

2547

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2548

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2549

#

2550

# * Validate the elements.

2551

#

2552

# * Apply a user-defined function to map each element to some value

2553

# and extract an element-specific key value.

2554

#

2555

# * Group elements with the same key into a single element with

2556

# that key, transforming a multiply-keyed collection into a

2557

# uniquely-keyed collection.

2558

#

2559

# * Write the elements out to some data sink.

2560

#

2561

# Note that the Cloud Dataflow service may be used to run many different

2562

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2563

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2564

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2565

"kind": "A String", # The kind of step in the Cloud Dataflow job.

2566

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2567

# predefined step has its own required set of properties.

2568

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2569

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2570

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2571

},

2572

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2573

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

2574

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

2575

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

2576

# isn't contained in the submitted job.

2577

"stages": { # A mapping from each stage to the information about that stage.

2578

"a_key": { # Contains information about how a particular

2579

# google.dataflow.v1beta3.Step will be executed.

2580

"stepName": [ # The steps associated with the execution stage.

2581

# Note that stages may have several steps, and that a given step

2582

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2589

#

2590

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

2591

# specified.

2592

#

2593

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

2594

# terminal state. After a job has reached a terminal state, no

2595

# further state updates may be made.

2596

#

2597

# This field may be mutated by the Cloud Dataflow service;

2598

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2599

"location": "A String", # The [regional endpoint]

2600

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2601

# contains this job.

2602

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

2603

# Flexible resource scheduling jobs are started with some delay after job

2604

# creation, so start_time is unset before start and is updated when the

2605

# job is started by the Cloud Dataflow service. For other jobs, start_time

2606

# always equals to create_time and is immutable and set by the Cloud Dataflow

2607

# service.

2608

"stepsLocation": "A String", # The GCS location where the steps are stored.

2609

"labels": { # User-defined labels for this job.

2610

#

2611

# The labels map can contain no more than 64 entries. Entries of the labels

2612

# map are UTF8 strings that comply with the following restrictions:

2613

#

2614

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

2615

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

2616

# * Both keys and values are additionally constrained to be <= 128 bytes in

2617

# size.

2618

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2619

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2620

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

2621

# Cloud Dataflow service.

2622

"requestedState": "A String", # The job's requested state.

2623

#

2624

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

2625

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

2626

# also be used to directly set a job's requested state to

2627

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

2628

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2629

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

],

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

2636

<pre>Retrieves the next page of results.

2637

2638

Args:

2639

previous_request: The request for the previous page. (required)

2640

previous_response: The response from the request for the previous page. (required)

2641

2642

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2643

A request object that you can call 'execute()' on to request the next

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2644

page. Returns None if there are no more items in the collection.

</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2649

<code class="details" id="snapshot">snapshot(projectId, location, jobId, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2650

<pre>Snapshot the state of a streaming job.

2651

2652

Args:

2653

projectId: string, The project which owns the job to be snapshotted. (required)

2654

location: string, The location that contains this job. (required)

2655

jobId: string, The job to be snapshotted. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2656

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2657

The object takes the form of:

2658

2659

{ # Request to create a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2660

"description": "A String", # User specified description of the snapshot. Maybe empty.

2661

"snapshotSources": True or False, # If true, perform snapshots for sources which support this.

2662

"ttl": "A String", # TTL for the snapshot.

2663

"location": "A String", # The location that contains this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2664

}

2665

2666

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2673

2674

{ # Represents a snapshot of a job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2675

"state": "A String", # State of the snapshot.

2676

"sourceJobId": "A String", # The job this snapshot was created from.

2677

"projectId": "A String", # The project this snapshot belongs to.

2678

"id": "A String", # The unique ID of this snapshot.

2679

"ttl": "A String", # The time after which this snapshot will be automatically deleted.

2680

"description": "A String", # User specified description of the snapshot. Maybe empty.

2681

"diskSizeBytes": "A String", # The disk byte size of the snapshot. Only available for snapshots in READY

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2682

# state.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2683

"pubsubMetadata": [ # PubSub snapshot metadata.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2684

{ # Represents a Pubsub snapshot.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2685

"expireTime": "A String", # The expire time of the Pubsub snapshot.

2686

"snapshotName": "A String", # The name of the Pubsub snapshot.

2687

"topicName": "A String", # The name of the Pubsub topic.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2688

},

2689

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2690

"creationTime": "A String", # The time this snapshot was created.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2695

<code class="details" id="update">update(projectId, location, jobId, body=None, x__xgafv=None)</code>

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2696

<pre>Updates the state of an existing Cloud Dataflow job.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2697

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2698

To update the state of an existing job, we recommend using

2699

`projects.locations.jobs.update` with a [regional endpoint]

2700

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints). Using

2701

`projects.jobs.update` is not recommended, as you can only update the state

2702

of jobs that are running in `us-central1`.

2703

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2704

Args:

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2705

projectId: string, The ID of the Cloud Platform project that the job belongs to. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2706

location: string, The [regional endpoint]

2707

(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

2708

contains this job. (required)

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2709

jobId: string, The job ID. (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2710

body: object, The request body.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

2711

The object takes the form of:

2712

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

2713

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2714

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

2715

# If this field is set, the service will ensure its uniqueness.

2716

# The request to create a job will fail if the service has knowledge of a

2717

# previously submitted job with the same client's ID and job name.

2718

# The caller may use this field to ensure idempotence of job

2719

# creation across retried attempts to create a job.

2720

# By default, the field is empty and, in that case, the service ignores it.

2721

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2722

#

2723

# This field is set by the Cloud Dataflow service when the Job is

2724

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2725

"currentStateTime": "A String", # The timestamp associated with the current state.

2726

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2727

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2728

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2729

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2730

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

2731

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2732

# options are passed through the service and are used to recreate the

2733

# SDK pipeline options on the worker in a language agnostic and platform

2734

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2735

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2736

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2737

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

2738

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2739

# specified in order for the job to have workers.

2740

{ # Describes one particular pool of Cloud Dataflow workers to be

2741

# instantiated by the Cloud Dataflow service in order to perform the

2742

# computations required by a job. Note that a workflow job may use

2743

# multiple pools, in order to match the various computational

2744

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2745

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

2746

# select a default set of packages which are useful to worker

2747

# harnesses written in a particular language.

2748

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

2749

# the service will use the network "default".

2750

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2751

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2752

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

2753

# execute the job. If zero or unspecified, the service will

2754

# attempt to choose a reasonable default.

2755

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2756

# service will choose a number of threads (according to the number of cores

2757

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2758

"diskSourceImage": "A String", # Fully qualified source image for disks.

2759

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2760

{ # The packages that must be installed in order for a worker to run the

2761

# steps of the Cloud Dataflow job that will be assigned to its worker

2762

# pool.

2763

#

2764

# This is the mechanism by which the Cloud Dataflow SDK causes code to

2765

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2766

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2767

# various dependencies (libraries, data files, etc.) required in order

2768

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2769

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2770

#

2771

# Google Cloud Storage:

2772

#

2773

# storage.googleapis.com/{bucket}

2774

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2775

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2776

},

2777

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2778

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2779

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

2780

# `TEARDOWN_NEVER`.

2781

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

2782

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

2783

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

2784

# down.

2785

#

2786

# If the workers are not torn down by the service, they will

2787

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2788

# user's project until they are explicitly terminated by the user.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2789

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

2790

# policy except for small, manually supervised test jobs.

2791

#

2792

# If unknown or unspecified, the service will attempt to choose a reasonable

2793

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2794

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

2795

# Compute Engine API.

2796

"poolArgs": { # Extra arguments for this worker pool.

2797

"a_key": "", # Properties of the object. Contains field @type with type URL.

2798

},

2799

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2800

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2801

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

2802

# harness, residing in Google Container Registry.

2803

#

2804

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

2805

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2806

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2807

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

2808

# service will attempt to choose a reasonable default.

2809

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

2810

# are supported.

2811

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2812

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2813

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2814

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2815

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2816

# must be a disk type appropriate to the project and zone in which

2817

# the workers will run. If unknown or unspecified, the service

2818

# will attempt to choose a reasonable default.

2819

#

2820

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2821

# typically ending in "pd-standard". If SSD persistent disks are

2822

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2823

# actual valid values are defined the Google Compute Engine API,

2824

# not by the Cloud Dataflow API; consult the Google Compute Engine

2825

# documentation for more information about determining the set of

2826

# available disk types for a particular project and zone.

2827

#

2828

# Google Compute Engine Disk types are local to a particular

2829

# project in a particular zone, and so the resource name will

2830

# typically look something like this:

2831

#

2832

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2833

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2834

},

2835

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2836

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2837

# only be set in the Fn API path. For non-cross-language pipelines this

2838

# should have only one entry. Cross-language pipelines will have two or more

2839

# entries.

2840

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2841

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

2842

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2843

# container instance with this image. If false (or unset) recommends using

2844

# more than one core per SDK container instance with this image for

2845

# efficiency. Note that Dataflow service may choose to override this property

2846

# if needed.

2847

},

2848

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2849

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

2850

# the form "regions/REGION/subnetworks/SUBNETWORK".

2851

"ipConfiguration": "A String", # Configuration for VM IPs.

2852

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

2853

# using the standard Dataflow task runner. Users should ignore

2854

# this field.

2855

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

2856

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

2857

# taskrunner; e.g. "wheel".

2858

"harnessCommand": "A String", # The command to launch the worker harness.

2859

"logDir": "A String", # The directory on the VM to store logs.

2860

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

2861

# access the Cloud Dataflow API.

2862

"A String",

2863

],

2864

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

2865

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

2866

# will not be uploaded.

2867

#

2868

# The supported resource type is:

2869

#

2870

# Google Cloud Storage:

2871

# storage.googleapis.com/{bucket}/{object}

2872

# bucket.storage.googleapis.com/{object}

2873

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

2874

"workflowFileName": "A String", # The file to store the workflow in.

2875

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

2876

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

2877

# temporary storage.

2878

#

2879

# The supported resource type is:

2880

#

2881

# Google Cloud Storage:

2882

# storage.googleapis.com/{bucket}/{object}

2883

# bucket.storage.googleapis.com/{object}

2884

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

2885

"languageHint": "A String", # The suggested backend language.

2886

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

2887

#

2888

# When workers access Google Cloud APIs, they logically do so via

2889

# relative URLs. If this field is specified, it supplies the base

2890

# URL to use for resolving these relative URLs. The normative

2891

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2892

# Locators".

2893

#

2894

# If not specified, the default value is "http://www.googleapis.com/"

2895

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

2896

# console.

2897

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

2898

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

2899

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

2900

#

2901

# When workers access Google Cloud APIs, they logically do so via

2902

# relative URLs. If this field is specified, it supplies the base

2903

# URL to use for resolving these relative URLs. The normative

2904

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

2905

# Locators".

2906

#

2907

# If not specified, the default value is "http://www.googleapis.com/"

2908

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

2909

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

2910

# "dataflow/v1b3/projects".

2911

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

2912

# "shuffle/v1beta1".

2913

"workerId": "A String", # The ID of the worker running this pipeline.

2914

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2915

# storage.

2916

#

2917

# The supported resource type is:

2918

#

2919

# Google Cloud Storage:

2920

#

2921

# storage.googleapis.com/{bucket}/{object}

2922

# bucket.storage.googleapis.com/{object}

2923

},

2924

"vmId": "A String", # The ID string of the VM.

2925

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

2926

# taskrunner; e.g. "root".

2927

},

2928

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

2929

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

2930

"algorithm": "A String", # The algorithm to use for autoscaling.

2931

},

2932

"metadata": { # Metadata to set on the Google Compute Engine VMs.

2933

"a_key": "A String",

2934

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2935

},

2936

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2937

"dataset": "A String", # The dataset for the current project where various workflow

2938

# related tables are stored.

2939

#

2940

# The supported resource type is:

2941

#

2942

# Google BigQuery:

2943

# bigquery.googleapis.com/{dataset}

2944

"internalExperiments": { # Experimental settings.

2945

"a_key": "", # Properties of the object. Contains field @type with type URL.

2946

},

2947

"workerRegion": "A String", # The Compute Engine region

2948

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2949

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

2950

# with worker_zone. If neither worker_region nor worker_zone is specified,

2951

# default to the control plane's region.

2952

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

2953

# at rest, AKA a Customer Managed Encryption Key (CMEK).

2954

#

2955

# Format:

2956

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

2957

"userAgent": { # A description of the process that generated the request.

2958

"a_key": "", # Properties of the object.

2959

},

2960

"workerZone": "A String", # The Compute Engine zone

2961

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

2962

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

2963

# with worker_region. If neither worker_region nor worker_zone is specified,

2964

# a zone in the control plane's region is chosen based on available capacity.

2965

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2966

# unspecified, the service will attempt to choose a reasonable

2967

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2968

# e.g. "compute.googleapis.com".

2969

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

2970

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2971

# this resource prefix, where {JOBNAME} is the value of the

2972

# job_name field. The resulting bucket and object prefix is used

2973

# as the prefix of the resources used to store temporary data

2974

# needed during the job execution. NOTE: This will override the

2975

# value in taskrunner_settings.

2976

# The supported resource type is:

2977

#

2978

# Google Cloud Storage:

2979

#

2980

# storage.googleapis.com/{bucket}/{object}

2981

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2982

"experiments": [ # The list of experiments to enable.

2983

"A String",

2984

],

2985

"version": { # A structure describing which components and their versions of the service

2986

# are required in order to run the job.

2987

"a_key": "", # Properties of the object.

2988

},

2989

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2990

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

2991

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

2992

# callers cannot mutate it.

2993

{ # A message describing the state of a particular execution stage.

2994

"executionStageName": "A String", # The name of the execution stage.

2995

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

2996

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

2997

},

2998

],

2999

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3000

# by the metadata values provided here. Populated for ListJobs and all GetJob

3001

# views SUMMARY and higher.

3002

# ListJob response and Job SUMMARY view.

3003

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3004

{ # Metadata for a BigTable connector used by the job.

3005

"tableId": "A String", # TableId accessed in the connection.

3006

"projectId": "A String", # ProjectId accessed in the connection.

3007

"instanceId": "A String", # InstanceId accessed in the connection.

3008

},

3009

],

3010

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3011

{ # Metadata for a Spanner connector used by the job.

3012

"databaseId": "A String", # DatabaseId accessed in the connection.

3013

"instanceId": "A String", # InstanceId accessed in the connection.

3014

"projectId": "A String", # ProjectId accessed in the connection.

3015

},

3016

],

3017

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3018

{ # Metadata for a Datastore connector used by the job.

3019

"projectId": "A String", # ProjectId accessed in the connection.

3020

"namespace": "A String", # Namespace used in the connection.

3021

},

3022

],

3023

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3024

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3025

"sdkSupportStatus": "A String", # The support status for this SDK version.

3026

"version": "A String", # The version of the SDK used to run the job.

3027

},

3028

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3029

{ # Metadata for a BigQuery connector used by the job.

3030

"table": "A String", # Table accessed in the connection.

3031

"dataset": "A String", # Dataset accessed in the connection.

3032

"projectId": "A String", # Project accessed in the connection.

3033

"query": "A String", # Query used to access data in the connection.

3034

},

3035

],

3036

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3037

{ # Metadata for a File connector used by the job.

3038

"filePattern": "A String", # File Pattern used to access files by the connector.

3039

},

3040

],

3041

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3042

{ # Metadata for a PubSub connector used by the job.

3043

"subscription": "A String", # Subscription used in the connection.

3044

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3049

# snapshot.

3050

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

3051

"type": "A String", # The type of Cloud Dataflow job.

3052

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3053

# A description of the user pipeline and stages through which it is executed.

3054

# Created by Cloud Dataflow service. Only retrieved with

3055

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3056

# form. This data is provided by the Dataflow service for ease of visualizing

3057

# the pipeline and interpreting Dataflow provided metrics.

3058

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3059

{ # Description of the composing transforms, names/ids, and input/outputs of a

3060

# stage of execution. Some composing transforms and sources may have been

3061

# generated by the Dataflow service during execution planning.

3062

"id": "A String", # Dataflow service generated id for this stage.

3063

"componentTransform": [ # Transforms that comprise this execution stage.

3064

{ # Description of a transform executed as part of an execution stage.

3065

"originalTransform": "A String", # User name for the original user transform with which this transform is

3066

# most closely associated.

3067

"name": "A String", # Dataflow service generated name for this source.

3068

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3069

},

3070

],

3071

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3072

{ # Description of an interstitial value between transforms in an execution

3073

# stage.

3074

"name": "A String", # Dataflow service generated name for this source.

3075

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3076

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3077

# source is most closely associated.

3078

},

3079

],

3080

"kind": "A String", # Type of tranform this stage is executing.

3081

"outputSource": [ # Output sources for this stage.

3082

{ # Description of an input or output of an execution stage.

3083

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3084

# source is most closely associated.

3085

"name": "A String", # Dataflow service generated name for this source.

3086

"sizeBytes": "A String", # Size of the source, if measurable.

3087

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3088

},

3089

],

3090

"name": "A String", # Dataflow service generated name for this stage.

3091

"inputSource": [ # Input sources for this stage.

3092

{ # Description of an input or output of an execution stage.

3093

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3094

# source is most closely associated.

3095

"name": "A String", # Dataflow service generated name for this source.

3096

"sizeBytes": "A String", # Size of the source, if measurable.

3097

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3103

{ # Description of the type, names/ids, and input/outputs for a transform.

3104

"kind": "A String", # Type of transform.

3105

"inputCollectionName": [ # User names for all collection inputs to this transform.

3106

"A String",

3107

],

3108

"name": "A String", # User provided name for this transform instance.

3109

"id": "A String", # SDK generated id of this transform instance.

3110

"displayData": [ # Transform-specific display data.

3111

{ # Data provided with a pipeline or transform to provide descriptive info.

3112

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3113

"boolValue": True or False, # Contains value if the data is of a boolean type.

3114

"javaClassValue": "A String", # Contains value if the data is of java class type.

3115

"strValue": "A String", # Contains value if the data is of string type.

3116

"int64Value": "A String", # Contains value if the data is of int64 type.

3117

"durationValue": "A String", # Contains value if the data is of duration type.

3118

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3119

# language namespace (i.e. python module) which defines the display data.

3120

# This allows a dax monitoring system to specially handle the data

3121

# and perform custom rendering.

3122

"floatValue": 3.14, # Contains value if the data is of float type.

3123

"key": "A String", # The key identifying the display data.

3124

# This is intended to be used as a label for the display data

3125

# when viewed in a dax monitoring system.

3126

"shortStrValue": "A String", # A possible additional shorter value to display.

3127

# For example a java_class_name_value of com.mypackage.MyDoFn

3128

# will be stored with MyDoFn as the short_str_value and

3129

# com.mypackage.MyDoFn as the java_class_name value.

3130

# short_str_value can be displayed and java_class_name_value

3131

# will be displayed as a tooltip.

3132

"url": "A String", # An optional full URL.

3133

"label": "A String", # An optional label to display in a dax UI for the element.

3134

},

3135

],

3136

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3142

{ # Data provided with a pipeline or transform to provide descriptive info.

3143

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3144

"boolValue": True or False, # Contains value if the data is of a boolean type.

3145

"javaClassValue": "A String", # Contains value if the data is of java class type.

3146

"strValue": "A String", # Contains value if the data is of string type.

3147

"int64Value": "A String", # Contains value if the data is of int64 type.

3148

"durationValue": "A String", # Contains value if the data is of duration type.

3149

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3150

# language namespace (i.e. python module) which defines the display data.

3151

# This allows a dax monitoring system to specially handle the data

3152

# and perform custom rendering.

3153

"floatValue": 3.14, # Contains value if the data is of float type.

3154

"key": "A String", # The key identifying the display data.

3155

# This is intended to be used as a label for the display data

3156

# when viewed in a dax monitoring system.

3157

"shortStrValue": "A String", # A possible additional shorter value to display.

3158

# For example a java_class_name_value of com.mypackage.MyDoFn

3159

# will be stored with MyDoFn as the short_str_value and

3160

# com.mypackage.MyDoFn as the java_class_name value.

3161

# short_str_value can be displayed and java_class_name_value

3162

# will be displayed as a tooltip.

3163

"url": "A String", # An optional full URL.

3164

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3169

# of the job it replaced.

3170

#

3171

# When sending a `CreateJobRequest`, you can update a job by specifying it

3172

# here. The job named here is stopped, and its intermediate state is

3173

# transferred to this job.

3174

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3175

# for temporary storage. These temporary files will be

3176

# removed on job completion.

3177

# No duplicates are allowed.

3178

# No file patterns are supported.

3179

#

3180

# The supported files are:

3181

#

3182

# Google Cloud Storage:

3183

#

3184

# storage.googleapis.com/{bucket}/{object}

3185

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3186

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3187

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3188

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3189

#

3190

# Only one Job with a given name may exist in a project at any

3191

# given time. If a caller attempts to create a Job with the same

3192

# name as an already-existing Job, the attempt returns the

3193

# existing Job.

3194

#

3195

# The name must match the regular expression

3196

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3197

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3198

#

3199

# The top-level steps that constitute the entire job.

3200

{ # Defines a particular step within a Cloud Dataflow job.

3201

#

3202

# A job consists of multiple steps, each of which performs some

3203

# specific operation as part of the overall job. Data is typically

3204

# passed from one step to another as part of the job.

3205

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3206

# Here's an example of a sequence of steps which together implement a

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3207

# Map-Reduce job:

3208

#

3209

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3210

# collection's elements.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3211

#

3212

# * Validate the elements.

3213

#

3214

# * Apply a user-defined function to map each element to some value

3215

# and extract an element-specific key value.

3216

#

3217

# * Group elements with the same key into a single element with

3218

# that key, transforming a multiply-keyed collection into a

3219

# uniquely-keyed collection.

3220

#

3221

# * Write the elements out to some data sink.

3222

#

3223

# Note that the Cloud Dataflow service may be used to run many different

3224

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3225

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3226

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3227

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3228

"properties": { # Named properties associated with the step. Each kind of

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3229

# predefined step has its own required set of properties.

3230

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3231

"a_key": "", # Properties of the object.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3232

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3233

},

3234

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3235

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3236

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3237

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3238

# isn't contained in the submitted job.

3239

"stages": { # A mapping from each stage to the information about that stage.

3240

"a_key": { # Contains information about how a particular

3241

# google.dataflow.v1beta3.Step will be executed.

3242

"stepName": [ # The steps associated with the execution stage.

3243

# Note that stages may have several steps, and that a given step

3244

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3251

#

3252

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3253

# specified.

3254

#

3255

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3256

# terminal state. After a job has reached a terminal state, no

3257

# further state updates may be made.

3258

#

3259

# This field may be mutated by the Cloud Dataflow service;

3260

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3261

"location": "A String", # The [regional endpoint]

3262

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3263

# contains this job.

3264

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3265

# Flexible resource scheduling jobs are started with some delay after job

3266

# creation, so start_time is unset before start and is updated when the

3267

# job is started by the Cloud Dataflow service. For other jobs, start_time

3268

# always equals to create_time and is immutable and set by the Cloud Dataflow

3269

# service.

3270

"stepsLocation": "A String", # The GCS location where the steps are stored.

3271

"labels": { # User-defined labels for this job.

3272

#

3273

# The labels map can contain no more than 64 entries. Entries of the labels

3274

# map are UTF8 strings that comply with the following restrictions:

3275

#

3276

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3277

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3278

# * Both keys and values are additionally constrained to be <= 128 bytes in

3279

# size.

3280

"a_key": "A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3281

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3282

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3283

# Cloud Dataflow service.

3284

"requestedState": "A String", # The job's requested state.

3285

#

3286

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3287

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3288

# also be used to directly set a job's requested state to

3289

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3290

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3291

}

3292

3293

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

3300

3301

{ # Defines a job to be run by the Cloud Dataflow service.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3302

"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.

3303

# If this field is set, the service will ensure its uniqueness.

3304

# The request to create a job will fail if the service has knowledge of a

3305

# previously submitted job with the same client's ID and job name.

3306

# The caller may use this field to ensure idempotence of job

3307

# creation across retried attempts to create a job.

3308

# By default, the field is empty and, in that case, the service ignores it.

3309

"id": "A String", # The unique ID of this job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3310

#

3311

# This field is set by the Cloud Dataflow service when the Job is

3312

# created, and is immutable for the life of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3313

"currentStateTime": "A String", # The timestamp associated with the current state.

3314

"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3315

# corresponding name prefixes of the new job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3316

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3317

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3318

"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.

3319

"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3320

# options are passed through the service and are used to recreate the

3321

# SDK pipeline options on the worker in a language agnostic and platform

3322

# independent way.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3323

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3324

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3325

"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.

3326

"workerPools": [ # The worker pools. At least one "harness" worker pool must be

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3327

# specified in order for the job to have workers.

3328

{ # Describes one particular pool of Cloud Dataflow workers to be

3329

# instantiated by the Cloud Dataflow service in order to perform the

3330

# computations required by a job. Note that a workflow job may use

3331

# multiple pools, in order to match the various computational

3332

# requirements of the various stages of the job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3333

"defaultPackageSet": "A String", # The default package set to install. This allows the service to

3334

# select a default set of packages which are useful to worker

3335

# harnesses written in a particular language.

3336

"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,

3337

# the service will use the network "default".

3338

"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3339

# will attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3340

"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to

3341

# execute the job. If zero or unspecified, the service will

3342

# attempt to choose a reasonable default.

3343

"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3344

# service will choose a number of threads (according to the number of cores

3345

# on the selected machine type for batch, or 1 by convention for streaming).

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3346

"diskSourceImage": "A String", # Fully qualified source image for disks.

3347

"packages": [ # Packages to be installed on workers.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3348

{ # The packages that must be installed in order for a worker to run the

3349

# steps of the Cloud Dataflow job that will be assigned to its worker

3350

# pool.

3351

#

3352

# This is the mechanism by which the Cloud Dataflow SDK causes code to

3353

# be loaded onto the workers. For example, the Cloud Dataflow Java SDK

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3354

# might use this to install jars containing the user's code and all of the

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3355

# various dependencies (libraries, data files, etc.) required in order

3356

# for that code to run.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3357

"location": "A String", # The resource to read the package from. The supported resource type is:

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3358

#

3359

# Google Cloud Storage:

3360

#

3361

# storage.googleapis.com/{bucket}

3362

# bucket.storage.googleapis.com/

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3363

"name": "A String", # The name of the package.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3364

},

3365

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3366

"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3367

# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and

3368

# `TEARDOWN_NEVER`.

3369

# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether

3370

# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down

3371

# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn

3372

# down.

3373

#

3374

# If the workers are not torn down by the service, they will

3375

# continue to run and use Google Compute Engine VM resources in the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3376

# user's project until they are explicitly terminated by the user.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3377

# Because of this, Google recommends using the `TEARDOWN_ALWAYS`

3378

# policy except for small, manually supervised test jobs.

3379

#

3380

# If unknown or unspecified, the service will attempt to choose a reasonable

3381

# default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3382

"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google

3383

# Compute Engine API.

3384

"poolArgs": { # Extra arguments for this worker pool.

3385

"a_key": "", # Properties of the object. Contains field @type with type URL.

3386

},

3387

"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3388

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3389

"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker

3390

# harness, residing in Google Container Registry.

3391

#

3392

# Deprecated for the Fn API path. Use sdk_harness_container_images instead.

3393

"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3394

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3395

"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the

3396

# service will attempt to choose a reasonable default.

3397

"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`

3398

# are supported.

3399

"dataDisks": [ # Data disks that are used by a VM in this workflow.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3400

{ # Describes the data disk used by a workflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3401

"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3402

# attempt to choose a reasonable default.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3403

"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3404

# must be a disk type appropriate to the project and zone in which

3405

# the workers will run. If unknown or unspecified, the service

3406

# will attempt to choose a reasonable default.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3407

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3408

# For example, the standard persistent disk type is a resource name

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3409

# typically ending in "pd-standard". If SSD persistent disks are

3410

# available, the resource name typically ends with "pd-ssd". The

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3411

# actual valid values are defined the Google Compute Engine API,

3412

# not by the Cloud Dataflow API; consult the Google Compute Engine

3413

# documentation for more information about determining the set of

3414

# available disk types for a particular project and zone.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3415

#

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3416

# Google Compute Engine Disk types are local to a particular

3417

# project in a particular zone, and so the resource name will

3418

# typically look something like this:

3419

#

3420

# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3421

"mountPoint": "A String", # Directory in a VM where disk is mounted.

Sai Cheemalapati

2017-06-06 18:46:08 -0400

[diff] [blame]

3422

},

3423

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3424

"sdkHarnessContainerImages": [ # Set of SDK harness containers needed to execute this pipeline. This will

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3425

# only be set in the Fn API path. For non-cross-language pipelines this

3426

# should have only one entry. Cross-language pipelines will have two or more

3427

# entries.

3428

{ # Defines a SDK harness container for executing Dataflow pipelines.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3429

"containerImage": "A String", # A docker container image that resides in Google Container Registry.

3430

"useSingleCorePerContainer": True or False, # If true, recommends the Dataflow service to use only one core per SDK

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3431

# container instance with this image. If false (or unset) recommends using

3432

# more than one core per SDK container instance with this image for

3433

# efficiency. Note that Dataflow service may choose to override this property

3434

# if needed.

3435

},

3436

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3437

"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of

3438

# the form "regions/REGION/subnetworks/SUBNETWORK".

3439

"ipConfiguration": "A String", # Configuration for VM IPs.

3440

"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when

3441

# using the standard Dataflow task runner. Users should ignore

3442

# this field.

3443

"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.

3444

"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by

3445

# taskrunner; e.g. "wheel".

3446

"harnessCommand": "A String", # The command to launch the worker harness.

3447

"logDir": "A String", # The directory on the VM to store logs.

3448

"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to

3449

# access the Cloud Dataflow API.

3450

"A String",

3451

],

3452

"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"

3453

"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs

3454

# will not be uploaded.

3455

#

3456

# The supported resource type is:

3457

#

3458

# Google Cloud Storage:

3459

# storage.googleapis.com/{bucket}/{object}

3460

# bucket.storage.googleapis.com/{object}

3461

"streamingWorkerMainClass": "A String", # The streaming worker main class name.

3462

"workflowFileName": "A String", # The file to store the workflow in.

3463

"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.

3464

"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for

3465

# temporary storage.

3466

#

3467

# The supported resource type is:

3468

#

3469

# Google Cloud Storage:

3470

# storage.googleapis.com/{bucket}/{object}

3471

# bucket.storage.googleapis.com/{object}

3472

"commandlinesFileName": "A String", # The file to store preprocessing commands in.

3473

"languageHint": "A String", # The suggested backend language.

3474

"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.

3475

#

3476

# When workers access Google Cloud APIs, they logically do so via

3477

# relative URLs. If this field is specified, it supplies the base

3478

# URL to use for resolving these relative URLs. The normative

3479

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3480

# Locators".

3481

#

3482

# If not specified, the default value is "http://www.googleapis.com/"

3483

"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial

3484

# console.

3485

"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.

3486

"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.

3487

"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.

3488

#

3489

# When workers access Google Cloud APIs, they logically do so via

3490

# relative URLs. If this field is specified, it supplies the base

3491

# URL to use for resolving these relative URLs. The normative

3492

# algorithm used is defined by RFC 1808, "Relative Uniform Resource

3493

# Locators".

3494

#

3495

# If not specified, the default value is "http://www.googleapis.com/"

3496

"reportingEnabled": True or False, # Whether to send work progress updates to the service.

3497

"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,

3498

# "dataflow/v1b3/projects".

3499

"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,

3500

# "shuffle/v1beta1".

3501

"workerId": "A String", # The ID of the worker running this pipeline.

3502

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3503

# storage.

3504

#

3505

# The supported resource type is:

3506

#

3507

# Google Cloud Storage:

3508

#

3509

# storage.googleapis.com/{bucket}/{object}

3510

# bucket.storage.googleapis.com/{object}

3511

},

3512

"vmId": "A String", # The ID string of the VM.

3513

"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by

3514

# taskrunner; e.g. "root".

3515

},

3516

"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.

3517

"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.

3518

"algorithm": "A String", # The algorithm to use for autoscaling.

3519

},

3520

"metadata": { # Metadata to set on the Google Compute Engine VMs.

3521

"a_key": "A String",

3522

},

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3523

},

3524

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3525

"dataset": "A String", # The dataset for the current project where various workflow

3526

# related tables are stored.

3527

#

3528

# The supported resource type is:

3529

#

3530

# Google BigQuery:

3531

# bigquery.googleapis.com/{dataset}

3532

"internalExperiments": { # Experimental settings.

3533

"a_key": "", # Properties of the object. Contains field @type with type URL.

3534

},

3535

"workerRegion": "A String", # The Compute Engine region

3536

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3537

# which worker processing should occur, e.g. "us-west1". Mutually exclusive

3538

# with worker_zone. If neither worker_region nor worker_zone is specified,

3539

# default to the control plane's region.

3540

"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data

3541

# at rest, AKA a Customer Managed Encryption Key (CMEK).

3542

#

3543

# Format:

3544

# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY

3545

"userAgent": { # A description of the process that generated the request.

3546

"a_key": "", # Properties of the object.

3547

},

3548

"workerZone": "A String", # The Compute Engine zone

3549

# (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in

3550

# which worker processing should occur, e.g. "us-west1-a". Mutually exclusive

3551

# with worker_region. If neither worker_region nor worker_zone is specified,

3552

# a zone in the control plane's region is chosen based on available capacity.

3553

"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3554

# unspecified, the service will attempt to choose a reasonable

3555

# default. This should be in the form of the API service name,

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3556

# e.g. "compute.googleapis.com".

3557

"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary

3558

# storage. The system will append the suffix "/temp-{JOBNAME} to

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3559

# this resource prefix, where {JOBNAME} is the value of the

3560

# job_name field. The resulting bucket and object prefix is used

3561

# as the prefix of the resources used to store temporary data

3562

# needed during the job execution. NOTE: This will override the

3563

# value in taskrunner_settings.

3564

# The supported resource type is:

3565

#

3566

# Google Cloud Storage:

3567

#

3568

# storage.googleapis.com/{bucket}/{object}

3569

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3570

"experiments": [ # The list of experiments to enable.

3571

"A String",

3572

],

3573

"version": { # A structure describing which components and their versions of the service

3574

# are required in order to run the job.

3575

"a_key": "", # Properties of the object.

3576

},

3577

"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3578

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3579

"stageStates": [ # This field may be mutated by the Cloud Dataflow service;

3580

# callers cannot mutate it.

3581

{ # A message describing the state of a particular execution stage.

3582

"executionStageName": "A String", # The name of the execution stage.

3583

"currentStateTime": "A String", # The time at which the stage transitioned to this state.

3584

"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.

3585

},

3586

],

3587

"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs

3588

# by the metadata values provided here. Populated for ListJobs and all GetJob

3589

# views SUMMARY and higher.

3590

# ListJob response and Job SUMMARY view.

3591

"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.

3592

{ # Metadata for a BigTable connector used by the job.

3593

"tableId": "A String", # TableId accessed in the connection.

3594

"projectId": "A String", # ProjectId accessed in the connection.

3595

"instanceId": "A String", # InstanceId accessed in the connection.

3596

},

3597

],

3598

"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.

3599

{ # Metadata for a Spanner connector used by the job.

3600

"databaseId": "A String", # DatabaseId accessed in the connection.

3601

"instanceId": "A String", # InstanceId accessed in the connection.

3602

"projectId": "A String", # ProjectId accessed in the connection.

3603

},

3604

],

3605

"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.

3606

{ # Metadata for a Datastore connector used by the job.

3607

"projectId": "A String", # ProjectId accessed in the connection.

3608

"namespace": "A String", # Namespace used in the connection.

3609

},

3610

],

3611

"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.

3612

"versionDisplayName": "A String", # A readable string describing the version of the SDK.

3613

"sdkSupportStatus": "A String", # The support status for this SDK version.

3614

"version": "A String", # The version of the SDK used to run the job.

3615

},

3616

"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.

3617

{ # Metadata for a BigQuery connector used by the job.

3618

"table": "A String", # Table accessed in the connection.

3619

"dataset": "A String", # Dataset accessed in the connection.

3620

"projectId": "A String", # Project accessed in the connection.

3621

"query": "A String", # Query used to access data in the connection.

3622

},

3623

],

3624

"fileDetails": [ # Identification of a File source used in the Dataflow job.

3625

{ # Metadata for a File connector used by the job.

3626

"filePattern": "A String", # File Pattern used to access files by the connector.

3627

},

3628

],

3629

"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.

3630

{ # Metadata for a PubSub connector used by the job.

3631

"subscription": "A String", # Subscription used in the connection.

3632

"topic": "A String", # Topic accessed in the connection.

},

],

},

"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given

3637

# snapshot.

3638

"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.

3639

"type": "A String", # The type of Cloud Dataflow job.

3640

"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.

3641

# A description of the user pipeline and stages through which it is executed.

3642

# Created by Cloud Dataflow service. Only retrieved with

3643

# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.

3644

# form. This data is provided by the Dataflow service for ease of visualizing

3645

# the pipeline and interpreting Dataflow provided metrics.

3646

"executionPipelineStage": [ # Description of each stage of execution of the pipeline.

3647

{ # Description of the composing transforms, names/ids, and input/outputs of a

3648

# stage of execution. Some composing transforms and sources may have been

3649

# generated by the Dataflow service during execution planning.

3650

"id": "A String", # Dataflow service generated id for this stage.

3651

"componentTransform": [ # Transforms that comprise this execution stage.

3652

{ # Description of a transform executed as part of an execution stage.

3653

"originalTransform": "A String", # User name for the original user transform with which this transform is

3654

# most closely associated.

3655

"name": "A String", # Dataflow service generated name for this source.

3656

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3657

},

3658

],

3659

"componentSource": [ # Collections produced and consumed by component transforms of this stage.

3660

{ # Description of an interstitial value between transforms in an execution

3661

# stage.

3662

"name": "A String", # Dataflow service generated name for this source.

3663

"userName": "A String", # Human-readable name for this transform; may be user or system generated.

3664

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3665

# source is most closely associated.

3666

},

3667

],

3668

"kind": "A String", # Type of tranform this stage is executing.

3669

"outputSource": [ # Output sources for this stage.

3670

{ # Description of an input or output of an execution stage.

3671

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3672

# source is most closely associated.

3673

"name": "A String", # Dataflow service generated name for this source.

3674

"sizeBytes": "A String", # Size of the source, if measurable.

3675

"userName": "A String", # Human-readable name for this source; may be user or system generated.

3676

},

3677

],

3678

"name": "A String", # Dataflow service generated name for this stage.

3679

"inputSource": [ # Input sources for this stage.

3680

{ # Description of an input or output of an execution stage.

3681

"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this

3682

# source is most closely associated.

3683

"name": "A String", # Dataflow service generated name for this source.

3684

"sizeBytes": "A String", # Size of the source, if measurable.

3685

"userName": "A String", # Human-readable name for this source; may be user or system generated.

},

],

},

],

"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.

3691

{ # Description of the type, names/ids, and input/outputs for a transform.

3692

"kind": "A String", # Type of transform.

3693

"inputCollectionName": [ # User names for all collection inputs to this transform.

3694

"A String",

3695

],

3696

"name": "A String", # User provided name for this transform instance.

3697

"id": "A String", # SDK generated id of this transform instance.

3698

"displayData": [ # Transform-specific display data.

3699

{ # Data provided with a pipeline or transform to provide descriptive info.

3700

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3701

"boolValue": True or False, # Contains value if the data is of a boolean type.

3702

"javaClassValue": "A String", # Contains value if the data is of java class type.

3703

"strValue": "A String", # Contains value if the data is of string type.

3704

"int64Value": "A String", # Contains value if the data is of int64 type.

3705

"durationValue": "A String", # Contains value if the data is of duration type.

3706

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3707

# language namespace (i.e. python module) which defines the display data.

3708

# This allows a dax monitoring system to specially handle the data

3709

# and perform custom rendering.

3710

"floatValue": 3.14, # Contains value if the data is of float type.

3711

"key": "A String", # The key identifying the display data.

3712

# This is intended to be used as a label for the display data

3713

# when viewed in a dax monitoring system.

3714

"shortStrValue": "A String", # A possible additional shorter value to display.

3715

# For example a java_class_name_value of com.mypackage.MyDoFn

3716

# will be stored with MyDoFn as the short_str_value and

3717

# com.mypackage.MyDoFn as the java_class_name value.

3718

# short_str_value can be displayed and java_class_name_value

3719

# will be displayed as a tooltip.

3720

"url": "A String", # An optional full URL.

3721

"label": "A String", # An optional label to display in a dax UI for the element.

3722

},

3723

],

3724

"outputCollectionName": [ # User names for all collection outputs to this transform.

"A String",

],

},

],

"displayData": [ # Pipeline level display data.

3730

{ # Data provided with a pipeline or transform to provide descriptive info.

3731

"timestampValue": "A String", # Contains value if the data is of timestamp type.

3732

"boolValue": True or False, # Contains value if the data is of a boolean type.

3733

"javaClassValue": "A String", # Contains value if the data is of java class type.

3734

"strValue": "A String", # Contains value if the data is of string type.

3735

"int64Value": "A String", # Contains value if the data is of int64 type.

3736

"durationValue": "A String", # Contains value if the data is of duration type.

3737

"namespace": "A String", # The namespace for the key. This is usually a class name or programming

3738

# language namespace (i.e. python module) which defines the display data.

3739

# This allows a dax monitoring system to specially handle the data

3740

# and perform custom rendering.

3741

"floatValue": 3.14, # Contains value if the data is of float type.

3742

"key": "A String", # The key identifying the display data.

3743

# This is intended to be used as a label for the display data

3744

# when viewed in a dax monitoring system.

3745

"shortStrValue": "A String", # A possible additional shorter value to display.

3746

# For example a java_class_name_value of com.mypackage.MyDoFn

3747

# will be stored with MyDoFn as the short_str_value and

3748

# com.mypackage.MyDoFn as the java_class_name value.

3749

# short_str_value can be displayed and java_class_name_value

3750

# will be displayed as a tooltip.

3751

"url": "A String", # An optional full URL.

3752

"label": "A String", # An optional label to display in a dax UI for the element.

},

],

},

"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID

3757

# of the job it replaced.

3758

#

3759

# When sending a `CreateJobRequest`, you can update a job by specifying it

3760

# here. The job named here is stopped, and its intermediate state is

3761

# transferred to this job.

3762

"tempFiles": [ # A set of files the system should be aware of that are used

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3763

# for temporary storage. These temporary files will be

3764

# removed on job completion.

3765

# No duplicates are allowed.

3766

# No file patterns are supported.

3767

#

3768

# The supported files are:

3769

#

3770

# Google Cloud Storage:

3771

#

3772

# storage.googleapis.com/{bucket}/{object}

3773

# bucket.storage.googleapis.com/{object}

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3774

"A String",

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3775

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3776

"name": "A String", # The user-specified Cloud Dataflow job name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3777

#

3778

# Only one Job with a given name may exist in a project at any

3779

# given time. If a caller attempts to create a Job with the same

3780

# name as an already-existing Job, the attempt returns the

3781

# existing Job.

3782

#

3783

# The name must match the regular expression

3784

# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3785

"steps": [ # Exactly one of step or steps_location should be specified.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3786

#

3787

# The top-level steps that constitute the entire job.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3788

{ # Defines a particular step within a Cloud Dataflow job.

3789

#

3790

# A job consists of multiple steps, each of which performs some

3791

# specific operation as part of the overall job. Data is typically

3792

# passed from one step to another as part of the job.

3793

#

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3794

# Here's an example of a sequence of steps which together implement a

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3795

# Map-Reduce job:

3796

#

3797

# * Read a collection of data from some source, parsing the

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3798

# collection's elements.

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3799

#

3800

# * Validate the elements.

3801

#

3802

# * Apply a user-defined function to map each element to some value

3803

# and extract an element-specific key value.

3804

#

3805

# * Group elements with the same key into a single element with

3806

# that key, transforming a multiply-keyed collection into a

3807

# uniquely-keyed collection.

3808

#

3809

# * Write the elements out to some data sink.

3810

#

3811

# Note that the Cloud Dataflow service may be used to run many different

3812

# types of jobs, not just Map-Reduce.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3813

"name": "A String", # The name that identifies the step. This must be unique for each

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

3814

# step with respect to all other steps in the Cloud Dataflow job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3815

"kind": "A String", # The kind of step in the Cloud Dataflow job.

3816

"properties": { # Named properties associated with the step. Each kind of

Sai Cheemalapati

2017-03-13 12:12:03 -0400

[diff] [blame]

3817

# predefined step has its own required set of properties.

3818

# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3819

"a_key": "", # Properties of the object.

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3820

},

3821

},

3822

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3823

"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in

3824

# `JOB_STATE_UPDATED`), this field contains the ID of that job.

3825

"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.

3826

# isn't contained in the submitted job.

3827

"stages": { # A mapping from each stage to the information about that stage.

3828

"a_key": { # Contains information about how a particular

3829

# google.dataflow.v1beta3.Step will be executed.

3830

"stepName": [ # The steps associated with the execution stage.

3831

# Note that stages may have several steps, and that a given step

3832

# might be run by more than one stage.

"A String",

],

},

},

},

"currentState": "A String", # The current state of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3839

#

3840

# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise

3841

# specified.

3842

#

3843

# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a

3844

# terminal state. After a job has reached a terminal state, no

3845

# further state updates may be made.

3846

#

3847

# This field may be mutated by the Cloud Dataflow service;

3848

# callers cannot mutate it.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3849

"location": "A String", # The [regional endpoint]

3850

# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that

3851

# contains this job.

3852

"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).

3853

# Flexible resource scheduling jobs are started with some delay after job

3854

# creation, so start_time is unset before start and is updated when the

3855

# job is started by the Cloud Dataflow service. For other jobs, start_time

3856

# always equals to create_time and is immutable and set by the Cloud Dataflow

3857

# service.

3858

"stepsLocation": "A String", # The GCS location where the steps are stored.

3859

"labels": { # User-defined labels for this job.

3860

#

3861

# The labels map can contain no more than 64 entries. Entries of the labels

3862

# map are UTF8 strings that comply with the following restrictions:

3863

#

3864

# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}

3865

# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}

3866

# * Both keys and values are additionally constrained to be <= 128 bytes in

3867

# size.

3868

"a_key": "A String",

Jon Wayne Parrott

2017-01-06 09:58:29 -0800

[diff] [blame]

3869

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame^]

3870

"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the

3871

# Cloud Dataflow service.

3872

"requestedState": "A String", # The job's requested state.

3873

#

3874

# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and

3875

# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may

3876

# also be used to directly set a job's requested state to

3877

# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the

3878

# job if it has not already reached a terminal state.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

3879

}</pre>

Jon Wayne Parrott