Blame - docs/dyn/dlp_v2.projects.dlpJobs.html - platform/external/python/google-api-python-client

2020-05-01 07:42:23 -0700

[diff] [blame]

81

<code><a href="#create">create(parent, body=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

82

<p class="firstline">Creates a new job to inspect storage or calculate risk metrics.</p>

83

84

<code><a href="#delete">delete(name, x__xgafv=None)</a></code></p>

85

<p class="firstline">Deletes a long-running DlpJob. This method indicates that the client is</p>

86

87

<code><a href="#get">get(name, x__xgafv=None)</a></code></p>

88

<p class="firstline">Gets the latest state of a long-running DlpJob.</p>

89

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

90

<code><a href="#list">list(parent, orderBy=None, type=None, filter=None, pageToken=None, locationId=None, pageSize=None, x__xgafv=None)</a></code></p>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

91

<p class="firstline">Lists DlpJobs that match the specified filter in the request.</p>

92

93

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

94

<p class="firstline">Retrieves the next page of results.</p>

95

<h3>Method Details</h3>

96

97

<code class="details" id="cancel">cancel(name, body=None, x__xgafv=None)</code>

98

<pre>Starts asynchronous cancellation on a long-running DlpJob. The server

99

makes a best effort to cancel the DlpJob, but success is not

100

guaranteed.

101

See https://cloud.google.com/dlp/docs/inspecting-storage and

102

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

103

104

Args:

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

105

name: string, Required. The name of the DlpJob resource to be cancelled. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

106

body: object, The request body.

107

The object takes the form of:

108

109

{ # The request message for canceling a DLP job.

110

}

111

112

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

119

120

{ # A generic empty message that you can re-use to avoid defining duplicated

121

# empty messages in your APIs. A typical example is to use it as the request

122

# or the response type of an API method. For instance:

123

#

124

# service Foo {

125

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

126

# }

127

#

128

# The JSON representation for `Empty` is empty JSON object `{}`.

}</pre>

</div>

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

133

<code class="details" id="create">create(parent, body=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

134

<pre>Creates a new job to inspect storage or calculate risk metrics.

135

See https://cloud.google.com/dlp/docs/inspecting-storage and

136

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

137

138

When no InfoTypes or CustomInfoTypes are specified in inspect jobs, the

139

system will automatically choose what detectors to run. By default this may

140

be all types, but may change over time as detectors are updated.

141

142

Args:

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

143

parent: string, Required. Parent resource name.

144

- Format:projects/[PROJECT-ID]

145

- Format:projects/[PROJECT-ID]/locations/[LOCATION-ID] (required)

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

146

body: object, The request body.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

147

The object takes the form of:

148

149

{ # Request message for CreateDlpJobRequest. Used to initiate long running

150

# jobs such as calculating risk metrics or inspecting Google Cloud

151

# Storage.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

152

"jobId": "A String", # The job id can contain uppercase and lowercase letters,

153

# numbers, and hyphens; that is, it must match the regular

154

# expression: `[a-zA-Z\\d-_]+`. The maximum length is 100

155

# characters. Can be empty to allow the system to generate one.

156

"locationId": "A String", # Deprecated. This field has no effect.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

157

"riskJob": { # Configuration for a risk analysis job. See # Set to choose what metric to calculate.

158

# https://cloud.google.com/dlp/docs/concepts-risk-analysis to learn more.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

159

"privacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

160

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including # Categorical stats

161

# number of distinct values and value count distribution.

162

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

163

# supported except for arrays and structs. However, it may be more

164

# informative to use NumericalStats when the field type is supported,

165

# depending on the data.

166

"name": "A String", # Name describing the field.

167

},

168

},

169

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk. # l-diversity

170

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

171

"name": "A String", # Name describing the field.

172

},

173

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

174

# defined for the l-diversity computation. When multiple fields are

175

# specified, they are considered a single composite key.

176

{ # General identifier of a data field in a storage service.

177

"name": "A String", # Name describing the field.

},

],

},

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what # k-map

182

# is called "journalist risk" in the literature, except the attack dataset is

183

# statistically modeled instead of being perfectly known. This can be done

184

# using publicly available data (like the US Census), or using a custom

185

# statistical model (indicated as one or several BigQuery tables), or by

186

# extrapolating from the distribution of values in the input dataset.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

187

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

188

# Set if no column is tagged with a region-specific InfoType (like

189

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

190

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two columns can have the

191

# same tag.

192

{ # A column with a semantic tag attached.

193

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

194

"name": "A String", # Name describing the field.

195

},

196

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

197

# indicate an auxiliary table that contains statistical information on

198

# the possible values of this column (below).

199

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

200

# dataset as a statistical model of population, if available. We

201

# currently support US ZIP codes, region codes, ages and genders.

202

# To programmatically obtain the list of supported InfoTypes, use

203

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

204

"name": "A String", # Name of the information type. Either a name of your choosing when

205

# creating a CustomInfoType, or one of the names listed

206

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

207

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

208

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

209

},

210

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

211

# the distribution of values in the input data

212

# empty messages in your APIs. A typical example is to use it as the request

213

# or the response type of an API method. For instance:

214

#

215

# service Foo {

216

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

217

# }

218

#

219

# The JSON representation for `Empty` is empty JSON object `{}`.

220

},

221

},

222

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

223

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

224

# used to tag a quasi-identifiers column must appear in exactly one column

225

# of one auxiliary table.

226

{ # An auxiliary table contains statistical information on the relative

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

227

# frequency of different quasi-identifiers values. It has one or several

228

# quasi-identifiers columns, and one column that indicates the relative

229

# frequency of each quasi-identifier tuple.

230

# If a tuple is present in the data but not in the auxiliary table, the

231

# corresponding relative frequency is assumed to be zero (and thus, the

232

# tuple is highly reidentifiable).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

233

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

234

# identified by its project_id, dataset_id, and table_name. Within a query

235

# a table is often referenced with a string in the format of:

236

# `<project_id>:<dataset_id>.<table_id>` or

237

# `<project_id>.<dataset_id>.<table_id>`.

238

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

239

# If omitted, project ID is inferred from the API call.

240

"datasetId": "A String", # Dataset ID of the table.

241

"tableId": "A String", # Name of the table.

242

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

243

"quasiIds": [ # Required. Quasi-identifier columns.

244

{ # A quasi-identifier column has a custom_tag, used to know which column

245

# in the data corresponds to which column in the statistical model.

246

"customTag": "A String", # A auxiliary field.

247

"field": { # General identifier of a data field in a storage service. # Identifies the column.

248

"name": "A String", # Name describing the field.

249

},

250

},

251

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

252

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

253

# between 0 and 1 (inclusive). Null values are assumed to be zero.

254

"name": "A String", # Name describing the field.

255

},

256

},

257

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

258

},

259

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to # delta-presence

260

# figure out that one given individual appears in a de-identified dataset.

261

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

262

# knowing the attack dataset, so we use a statistical model instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

263

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two fields can have the

264

# same tag.

265

{ # A column with a semantic tag attached.

266

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

267

"name": "A String", # Name describing the field.

268

},

269

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

270

# dataset as a statistical model of population, if available. We

271

# currently support US ZIP codes, region codes, ages and genders.

272

# To programmatically obtain the list of supported InfoTypes, use

273

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

274

"name": "A String", # Name of the information type. Either a name of your choosing when

275

# creating a CustomInfoType, or one of the names listed

276

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

277

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

278

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

279

},

280

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

281

# indicate an auxiliary table that contains statistical information on

282

# the possible values of this column (below).

283

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

284

# the distribution of values in the input data

285

# empty messages in your APIs. A typical example is to use it as the request

286

# or the response type of an API method. For instance:

287

#

288

# service Foo {

289

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

290

# }

291

#

292

# The JSON representation for `Empty` is empty JSON object `{}`.

293

},

294

},

295

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

296

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

297

# used to tag a quasi-identifiers field must appear in exactly one

298

# field of one auxiliary table.

299

{ # An auxiliary table containing statistical information on the relative

300

# frequency of different quasi-identifiers values. It has one or several

301

# quasi-identifiers columns, and one column that indicates the relative

302

# frequency of each quasi-identifier tuple.

303

# If a tuple is present in the data but not in the auxiliary table, the

304

# corresponding relative frequency is assumed to be zero (and thus, the

305

# tuple is highly reidentifiable).

306

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

307

# between 0 and 1 (inclusive). Null values are assumed to be zero.

308

"name": "A String", # Name describing the field.

309

},

310

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

311

# identified by its project_id, dataset_id, and table_name. Within a query

312

# a table is often referenced with a string in the format of:

313

# `<project_id>:<dataset_id>.<table_id>` or

314

# `<project_id>.<dataset_id>.<table_id>`.

315

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

316

# If omitted, project ID is inferred from the API call.

317

"datasetId": "A String", # Dataset ID of the table.

318

"tableId": "A String", # Name of the table.

319

},

320

"quasiIds": [ # Required. Quasi-identifier columns.

321

{ # A quasi-identifier column has a custom_tag, used to know which column

322

# in the data corresponds to which column in the statistical model.

323

"field": { # General identifier of a data field in a storage service. # Identifies the column.

324

"name": "A String", # Name describing the field.

325

},

326

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

327

# indicate an auxiliary table that contains statistical information on

328

# the possible values of this column (below).

},

],

},

],

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

334

# Set if no column is tagged with a region-specific InfoType (like

335

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

336

},

337

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk. # K-anonymity

338

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Message indicating that multiple rows might be associated to a

339

# single individual. If the same entity_id is associated to multiple

340

# quasi-identifier tuples over distinct rows, we consider the entire

341

# collection of tuples as the composite quasi-identifier. This collection

342

# is a multiset: the order in which the different tuples appear in the

343

# dataset is ignored, but their frequency is taken into account.

344

#

345

# Important note: a maximum of 1000 rows can be associated to a single

346

# entity ID. If more rows are associated with the same entity ID, some

347

# might be ignored.

348

# single person. For example, in medical records the `EntityId` might be a

349

# patient identifier, or for financial records it might be an account

350

# identifier. This message is used when generalizations or analysis must take

351

# into account that multiple rows correspond to the same entity.

352

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

353

"name": "A String", # Name describing the field.

354

},

355

},

356

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

357

# specified, they are considered a single composite key. Structs and

358

# repeated data types are not supported; however, nested fields are

359

# supported so long as they are not structs themselves or nested within

360

# a repeated field.

361

{ # General identifier of a data field in a storage service.

362

"name": "A String", # Name describing the field.

},

],

},

"numericalStatsConfig": { # Compute numerical stats over an individual column, including # Numerical stats

367

# min, max, and quantiles.

368

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

369

# integer, float, date, datetime, timestamp, time.

370

"name": "A String", # Name describing the field.

371

},

372

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

373

},

374

"actions": [ # Actions to execute at the completion of the job. Are executed in the order

375

# provided.

376

{ # A task to execute on the completion of a job.

377

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

378

"publishToStackdriver": { # Enable Stackdriver metric dlp.googleapis.com/finding_count. This # Enable Stackdriver metric dlp.googleapis.com/finding_count.

379

# will publish a metric to stack driver on each infotype requested and

380

# how many findings were found for it. CustomDetectors will be bucketed

381

# as 'Custom' under the Stackdriver label 'info_type'.

382

},

383

"publishFindingsToCloudDataCatalog": { # Publish findings of a DlpJob to Cloud Data Catalog. Labels summarizing the # Publish findings to Cloud Datahub.

384

# results of the DlpJob will be applied to the entry for the resource scanned

385

# in Cloud Data Catalog. Any labels previously written by another DlpJob will

386

# be deleted. InfoType naming patterns are strictly enforced when using this

387

# feature. Note that the findings will be persisted in Cloud Data Catalog

388

# storage and are governed by Data Catalog service-specific policy, see

389

# https://cloud.google.com/terms/service-terms

390

# Only a single instance of this action can be specified and only allowed if

391

# all resources being scanned are BigQuery tables.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

392

# Compatible with: Inspect

393

},

394

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification for project owners and editors on job's

395

# completion/failure.

396

# completion/failure.

397

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

398

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

399

# message contains a single field, `DlpJobName`, which is equal to the

400

# finished job's

401

# [`DlpJob.name`](https://cloud.google.com/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

402

# Compatible with: Inspect, Risk

403

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

404

# publishing access rights to the DLP API service account executing

405

# the long running DlpJob sending the notifications.

406

# Format is projects/{project}/topics/{topic}.

407

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

408

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

409

# OutputStorageConfig. Only a single instance of this action can be

410

# specified.

411

# Compatible with: Inspect, Risk

412

"outputConfig": { # Cloud repository for storing output. # Location to store findings outside of DLP.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

413

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

414

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

415

# from the `Finding` object. If appending to an existing table, any columns

416

# from the predefined schema that are missing will be added. No columns in

417

# the existing table will be deleted.

418

#

419

# If unspecified, then all available columns will be used for a new table or

420

# an (existing) table with no schema, and no changes will be made to an

421

# existing table that has a schema.

422

# Only for use with external storage.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

423

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

424

# dataset. If table_id is not set a new one will be generated

425

# for you with the following format:

426

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

427

# generating the date details.

428

#

429

# For Inspect, each column in an existing output table must have the same

430

# name, type, and mode of a field in the `Finding` object.

431

#

432

# For Risk, an existing output table should be the output of a previous

433

# Risk analysis job run on the same source table, with the same privacy

434

# metric and quasi-identifiers. Risk jobs that analyze the same table but

435

# compute a different privacy metric, or use different sets of

436

# quasi-identifiers, cannot store their results in the same table.

437

# identified by its project_id, dataset_id, and table_name. Within a query

438

# a table is often referenced with a string in the format of:

439

# `<project_id>:<dataset_id>.<table_id>` or

440

# `<project_id>.<dataset_id>.<table_id>`.

441

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

442

# If omitted, project ID is inferred from the API call.

443

"datasetId": "A String", # Dataset ID of the table.

444

"tableId": "A String", # Name of the table.

445

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

446

},

447

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

448

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

449

# Command Center (CSCC Alpha).

450

# This action is only available for projects which are parts of

451

# an organization and whitelisted for the alpha Cloud Security Command

452

# Center.

453

# The action will publish count of finding instances and their info types.

454

# The summary of findings will be persisted in CSCC and are governed by CSCC

455

# service-specific policy, see https://cloud.google.com/terms/service-terms

456

# Only a single instance of this action can be specified.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

457

# Compatible with: Inspect

458

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

459

},

460

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

461

"sourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

462

# identified by its project_id, dataset_id, and table_name. Within a query

463

# a table is often referenced with a string in the format of:

464

# `<project_id>:<dataset_id>.<table_id>` or

465

# `<project_id>.<dataset_id>.<table_id>`.

466

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

467

# If omitted, project ID is inferred from the API call.

468

"datasetId": "A String", # Dataset ID of the table.

469

"tableId": "A String", # Name of the table.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

470

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

471

},

472

"inspectJob": { # Controls what and how to inspect for findings. # Set to control what and how to inspect.

473

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

474

# `inspect_config` will be merged into the values persisted as part of the

475

# template.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

476

"actions": [ # Actions to execute at the completion of the job.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

477

{ # A task to execute on the completion of a job.

478

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

479

"publishToStackdriver": { # Enable Stackdriver metric dlp.googleapis.com/finding_count. This # Enable Stackdriver metric dlp.googleapis.com/finding_count.

480

# will publish a metric to stack driver on each infotype requested and

481

# how many findings were found for it. CustomDetectors will be bucketed

482

# as 'Custom' under the Stackdriver label 'info_type'.

483

},

484

"publishFindingsToCloudDataCatalog": { # Publish findings of a DlpJob to Cloud Data Catalog. Labels summarizing the # Publish findings to Cloud Datahub.

485

# results of the DlpJob will be applied to the entry for the resource scanned

486

# in Cloud Data Catalog. Any labels previously written by another DlpJob will

487

# be deleted. InfoType naming patterns are strictly enforced when using this

488

# feature. Note that the findings will be persisted in Cloud Data Catalog

489

# storage and are governed by Data Catalog service-specific policy, see

490

# https://cloud.google.com/terms/service-terms

491

# Only a single instance of this action can be specified and only allowed if

492

# all resources being scanned are BigQuery tables.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

493

# Compatible with: Inspect

494

},

495

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification for project owners and editors on job's

496

# completion/failure.

497

# completion/failure.

498

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

499

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

500

# message contains a single field, `DlpJobName`, which is equal to the

501

# finished job's

502

# [`DlpJob.name`](https://cloud.google.com/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

503

# Compatible with: Inspect, Risk

504

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

505

# publishing access rights to the DLP API service account executing

506

# the long running DlpJob sending the notifications.

507

# Format is projects/{project}/topics/{topic}.

508

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

509

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

510

# OutputStorageConfig. Only a single instance of this action can be

511

# specified.

512

# Compatible with: Inspect, Risk

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

513

"outputConfig": { # Cloud repository for storing output. # Location to store findings outside of DLP.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

514

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

515

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

516

# from the `Finding` object. If appending to an existing table, any columns

517

# from the predefined schema that are missing will be added. No columns in

518

# the existing table will be deleted.

519

#

520

# If unspecified, then all available columns will be used for a new table or

521

# an (existing) table with no schema, and no changes will be made to an

522

# existing table that has a schema.

523

# Only for use with external storage.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

524

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

525

# dataset. If table_id is not set a new one will be generated

526

# for you with the following format:

527

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

528

# generating the date details.

529

#

530

# For Inspect, each column in an existing output table must have the same

531

# name, type, and mode of a field in the `Finding` object.

532

#

533

# For Risk, an existing output table should be the output of a previous

534

# Risk analysis job run on the same source table, with the same privacy

535

# metric and quasi-identifiers. Risk jobs that analyze the same table but

536

# compute a different privacy metric, or use different sets of

537

# quasi-identifiers, cannot store their results in the same table.

538

# identified by its project_id, dataset_id, and table_name. Within a query

539

# a table is often referenced with a string in the format of:

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

540

# `<project_id>:<dataset_id>.<table_id>` or

541

# `<project_id>.<dataset_id>.<table_id>`.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

542

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

543

# If omitted, project ID is inferred from the API call.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

544

"datasetId": "A String", # Dataset ID of the table.

545

"tableId": "A String", # Name of the table.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

546

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

547

},

548

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

549

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

550

# Command Center (CSCC Alpha).

551

# This action is only available for projects which are parts of

552

# an organization and whitelisted for the alpha Cloud Security Command

553

# Center.

554

# The action will publish count of finding instances and their info types.

555

# The summary of findings will be persisted in CSCC and are governed by CSCC

556

# service-specific policy, see https://cloud.google.com/terms/service-terms

557

# Only a single instance of this action can be specified.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

558

# Compatible with: Inspect

559

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

560

},

561

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

562

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

563

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options.

564

# bucket.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

565

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

566

# number of bytes scanned is rounded down. Must be between 0 and 100,

567

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

568

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

569

"fileTypes": [ # List of file type groups to include in the scan.

570

# If empty, all files are scanned and available data format processors

571

# are applied. In addition, the binary content of the selected files

572

# is always scanned as well.

573

# Images are scanned only as binary if the specified region

574

# does not support image inspection and no file_types were specified.

575

# Image inspection is restricted to 'global', 'us', 'asia', and 'europe'.

576

"A String",

577

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

578

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

579

# than this value then the rest of the bytes are omitted. Only one

580

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

581

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

582

# Number of files scanned is rounded down. Must be between 0 and 100,

583

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

584

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

585

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

586

# `regex_file_set` must be set.

587

# expressions are used to allow fine-grained control over which files in the

588

# bucket to include.

589

#

590

# Included files are those that match at least one item in `include_regex` and

591

# do not match any items in `exclude_regex`. Note that a file that matches

592

# items from both lists will _not_ be included. For a match to occur, the

593

# entire file path (i.e., everything in the url after the bucket name) must

594

# match the regular expression.

595

#

596

# For example, given the input `{bucket_name: "mybucket", include_regex:

597

# ["directory1/.*"], exclude_regex:

598

# ["directory1/excluded.*"]}`:

599

#

600

# * `gs://mybucket/directory1/myfile` will be included

601

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

602

# across `/`)

603

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

604

# full path doesn't match any items in `include_regex`)

605

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

606

# matches an item in `exclude_regex`)

607

#

608

# If `include_regex` is left empty, it will match all files by default

609

# (this is equivalent to setting `include_regex: [".*"]`).

610

#

611

# Some other common use cases:

612

#

613

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

614

# files in `mybucket` except for .pdf files

615

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

616

# include all files directly under `gs://mybucket/directory/`, without matching

617

# across `/`

618

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

619

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

620

# the bucket that match at least one of these regular expressions will be

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

621

# excluded from the scan.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

622

#

623

# Regular expressions use RE2

624

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

625

# under the google/re2 repository on GitHub.

626

"A String",

627

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

628

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

629

# the bucket that match at least one of these regular expressions will be

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

630

# included in the set of files, except for those that also match an item in

631

# `exclude_regex`. Leaving this field empty will match all files by default

632

# (this is equivalent to including `.*` in the list).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

633

#

634

# Regular expressions use RE2

635

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

636

# under the google/re2 repository on GitHub.

"A String",

],

},

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

641

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

642

#

643

# If the url ends in a trailing slash, the bucket or directory represented

644

# by the url will be scanned non-recursively (content in sub-directories

645

# will not be scanned). This means that `gs://mybucket/` is equivalent to

646

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

647

# `gs://mybucket/directory/*`.

648

#

649

# Exactly one of `url` or `regex_file_set` must be set.

650

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

651

"sampleMethod": "A String",

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

652

},

653

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

654

"sampleMethod": "A String",

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

655

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

656

# identified by its project_id, dataset_id, and table_name. Within a query

657

# a table is often referenced with a string in the format of:

658

# `<project_id>:<dataset_id>.<table_id>` or

659

# `<project_id>.<dataset_id>.<table_id>`.

660

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

661

# If omitted, project ID is inferred from the API call.

662

"datasetId": "A String", # Dataset ID of the table.

663

"tableId": "A String", # Name of the table.

664

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

665

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

666

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

667

# 100 means no limit. Defaults to 0. Only one of rows_limit and

668

# rows_limit_percent can be specified. Cannot be used in conjunction with

669

# TimespanConfig.

670

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

671

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

672

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

673

# Cannot be used in conjunction with TimespanConfig.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

674

"identifyingFields": [ # Table fields that may uniquely identify a row within the table. When

675

# `actions.saveFindings.outputConfig.table` is specified, the values of

676

# columns specified here are available in the output table under

677

# `location.content_locations.record_location.record_key.id_values`. Nested

678

# fields such as `person.birthdate.year` are allowed.

679

{ # General identifier of a data field in a storage service.

680

"name": "A String", # Name describing the field.

681

},

682

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

683

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

684

# inspection of entire columns which you know have no findings.

685

{ # General identifier of a data field in a storage service.

686

"name": "A String", # Name describing the field.

687

},

688

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

689

},

690

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

691

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

692

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

693

# Used for data sources like Datastore and BigQuery.

694

#

695

# For BigQuery:

696

# Required to filter out rows based on the given start and

697

# end times. If not specified and the table was modified between the given

698

# start and end times, the entire table will be scanned.

699

# The valid data types of the timestamp field are: `INTEGER`, `DATE`,

700

# `TIMESTAMP`, or `DATETIME` BigQuery column.

701

#

702

# For Datastore.

703

# Valid data types of the timestamp field are: `TIMESTAMP`.

704

# Datastore entity will be scanned if the timestamp property does not

705

# exist or its value is empty or invalid.

706

"name": "A String", # Name describing the field.

707

},

708

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

709

# a valid start_time to avoid scanning files that have not been modified

710

# since the last time the JobTrigger executed. This will be based on the

711

# time of the execution of the last run of the JobTrigger.

712

"startTime": "A String", # Exclude files or rows older than this value.

713

"endTime": "A String", # Exclude files or rows newer than this value.

714

# If set to zero, no upper time limit is applied.

715

},

716

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options.

717

"kind": { # A representation of a Datastore kind. # The kind to process.

718

"name": "A String", # The name of the kind.

719

},

720

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

721

# by project and namespace, however the namespace ID may be empty.

722

# A partition ID identifies a grouping of entities. The grouping is always

723

# by project and namespace, however the namespace ID may be empty.

724

#

725

# A partition ID contains several dimensions:

726

# project ID and namespace ID.

727

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

728

"projectId": "A String", # The ID of the project to which the entities belong.

729

},

730

},

731

"hybridOptions": { # Configuration to control jobs where the content being inspected is outside # Hybrid inspection options.

732

# Early access feature is in a pre-release state and might change or have

733

# limited support. For more information, see

734

# https://cloud.google.com/products#product-launch-stages.

735

# of Google Cloud Platform.

736

"tableOptions": { # Instructions regarding the table content being inspected. # If the container is a table, additional information to make findings

737

# meaningful such as the columns that are primary keys.

738

"identifyingFields": [ # The columns that are the primary keys for table objects included in

739

# ContentItem. A copy of this cell's value will stored alongside alongside

740

# each finding so that the finding can be traced to the specific row it came

741

# from. No more than 3 may be provided.

742

{ # General identifier of a data field in a storage service.

743

"name": "A String", # Name describing the field.

},

],

},

"requiredFindingLabelKeys": [ # These are labels that each inspection request must include within their

748

# 'finding_labels' map. Request may contain others, but any missing one of

749

# these will be rejected.

750

#

751

# Label keys must be between 1 and 63 characters long and must conform

752

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

753

#

754

# No more than 10 keys can be required.

755

"A String",

756

],

757

"labels": { # To organize findings, these labels will be added to each finding.

758

#

759

# Label keys must be between 1 and 63 characters long and must conform

760

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

761

#

762

# Label values must be between 0 and 63 characters long and must conform

763

# to the regular expression `([a-z]([-a-z0-9]*[a-z0-9])?)?`.

764

#

765

# No more than 10 labels can be associated with a given finding.

766

#

767

# Examples:

768

# * `"environment" : "production"`

769

# * `"pipeline" : "etl"`

770

"a_key": "A String",

771

},

772

"description": "A String", # A short description of where the data is coming from. Will be stored once

773

# in the job. 256 max length.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

774

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

775

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

776

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

777

# When used with redactContent only info_types and min_likelihood are currently

778

# used.

779

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

780

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

781

{ # Custom information type provided by the user. Used to find domain-specific

782

# sensitive information configurable to the data in question.

783

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

784

# be used to match sensitive information specific to the data, such as a list

785

# of employee IDs or job titles.

786

#

787

# Dictionary words are case-insensitive and all characters other than letters

788

# and digits in the unicode [Basic Multilingual

789

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

790

# will be replaced with whitespace when scanning for matches, so the

791

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

792

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

793

# surrounding any match must be of a different type than the adjacent

794

# characters within the word, so letters must be next to non-letters and

795

# digits next to non-digits. For example, the dictionary word "jen" will

796

# match the first three letters of the text "jen123" but will return no

797

# matches for "jennifer".

798

#

799

# Dictionary words containing a large number of characters that are not

800

# letters or digits may result in unexpected findings because such characters

801

# are treated as whitespace. The

802

# [limits](https://cloud.google.com/dlp/limits) page contains details about

803

# the size limits of dictionaries. For dictionaries that do not fit within

804

# these constraints, consider using `LargeCustomDictionaryConfig` in the

805

# `StoredInfoType` API.

806

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

807

# is accepted.

808

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

809

# Example: gs://[BUCKET_NAME]/dictionary.txt

810

},

811

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

812

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

813

# at least one phrase and every phrase must contain at least 2 characters

814

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

820

# infoType, when the name matches one of existing infoTypes and that infoType

821

# is specified in `InspectContent.info_types` field. Specifying the latter

822

# adds findings to the one detected by the system. If built-in info type is

823

# not specified in `InspectContent.info_types` list then the name is treated

824

# as a custom info type.

825

"name": "A String", # Name of the information type. Either a name of your choosing when

826

# creating a CustomInfoType, or one of the names listed

827

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

828

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

829

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

830

},

831

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

832

# altered by a detection rule if the finding meets the criteria specified by

833

# the rule. Defaults to `VERY_LIKELY` if not specified.

834

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

835

# Rules are applied in order that they are specified. Not supported for the

836

# `surrogate_type` CustomInfoType.

837

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

838

# `CustomInfoType` to alter behavior under certain circumstances, depending

839

# on the specific details of the rule. Not supported for the `surrogate_type`

840

# custom infoType.

841

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

842

# proximity of hotwords.

843

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

844

# The total length of the window cannot exceed 1000 characters. Note that

845

# the finding itself will be included in the window, so that hotwords may

846

# be used to match substrings of the finding itself. For example, the

847

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

848

# adjusted upwards if the area code is known to be the local area code of

849

# a company office using the hotword regex "$xxx$", where "xxx"

850

# is the area code in question.

851

# rule.

852

"windowAfter": 42, # Number of characters after the finding to consider.

853

"windowBefore": 42, # Number of characters before the finding to consider.

854

},

855

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

856

# part of a detection rule.

857

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

858

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

859

# levels. For example, if a finding would be `POSSIBLE` without the

860

# detection rule and `relative_likelihood` is 1, then it is upgraded to

861

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

862

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

863

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

864

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

865

# a final likelihood of `LIKELY`.

866

},

867

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

868

"groupIndexes": [ # The index of the submatch to extract as findings. When not

869

# specified, the entire match is returned. No more than 3 may be included.

870

42,

871

],

872

"pattern": "A String", # Pattern defining the regular expression. Its syntax

873

# (https://github.com/google/re2/wiki/Syntax) can be found under the

874

# google/re2 repository on GitHub.

},

},

},

],

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

880

# support reversing.

881

# such as

882

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

883

# These types of transformations are

884

# those that perform pseudonymization, thereby producing a "surrogate" as

885

# output. This should be used in conjunction with a field on the

886

# transformation such as `surrogate_info_type`. This CustomInfoType does

887

# not support the use of `detection_rules`.

888

},

889

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

890

"groupIndexes": [ # The index of the submatch to extract as findings. When not

891

# specified, the entire match is returned. No more than 3 may be included.

892

42,

893

],

894

"pattern": "A String", # Pattern defining the regular expression. Its syntax

895

# (https://github.com/google/re2/wiki/Syntax) can be found under the

896

# google/re2 repository on GitHub.

897

},

898

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

899

# `InspectDataSource`. Not currently supported in `InspectContent`.

900

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

901

# `organizations/433245324/storedInfoTypes/432452342` or

902

# `projects/project-id/storedInfoTypes/432452342`.

903

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

904

# inspection was created. Output-only field, populated by the system.

905

},

906

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

907

# to be returned. It still can be used for rules matching.

908

},

909

],

910

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

911

# POSSIBLE.

912

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

913

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

914

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

915

# When set within `InspectContentRequest`, the maximum returned is 2000

916

# regardless if this is set higher.

917

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

918

{ # Max findings configuration per infoType, per content item or long

919

# running DlpJob.

920

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

921

# info_type should be provided. If InfoTypeLimit does not have an

922

# info_type, the DLP API applies the limit against all info_types that

923

# are found but not specified in another InfoTypeLimit.

924

"name": "A String", # Name of the information type. Either a name of your choosing when

925

# creating a CustomInfoType, or one of the names listed

926

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

927

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

928

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

929

},

930

"maxFindings": 42, # Max findings limit for the given infoType.

931

},

932

],

933

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

934

# When set within `InspectJobConfig`,

935

# the maximum returned is 2000 regardless if this is set higher.

936

# When set within `InspectContentRequest`, this field is ignored.

937

},

938

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

939

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

940

# included in the response; see Finding.quote.

941

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

942

# Exclusion rules, contained in the set are executed in the end, other

943

# rules are executed in the order they are specified for each info type.

944

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

945

# circumstances, depending on the specific details of the rules within the set.

946

"infoTypes": [ # List of infoTypes this rule set is applied to.

947

{ # Type of information detected by the API.

948

"name": "A String", # Name of the information type. Either a name of your choosing when

949

# creating a CustomInfoType, or one of the names listed

950

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

951

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

952

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

953

},

954

],

955

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

956

{ # A single inspection rule to be applied to infoTypes, specified in

957

# `InspectionRuleSet`.

958

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

959

# proximity of hotwords.

960

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

961

# The total length of the window cannot exceed 1000 characters. Note that

962

# the finding itself will be included in the window, so that hotwords may

963

# be used to match substrings of the finding itself. For example, the

964

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

965

# adjusted upwards if the area code is known to be the local area code of

966

# a company office using the hotword regex "$xxx$", where "xxx"

967

# is the area code in question.

968

# rule.

969

"windowAfter": 42, # Number of characters after the finding to consider.

970

"windowBefore": 42, # Number of characters before the finding to consider.

971

},

972

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

973

# part of a detection rule.

974

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

975

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

976

# levels. For example, if a finding would be `POSSIBLE` without the

977

# detection rule and `relative_likelihood` is 1, then it is upgraded to

978

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

979

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

980

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

981

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

982

# a final likelihood of `LIKELY`.

983

},

984

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

985

"groupIndexes": [ # The index of the submatch to extract as findings. When not

986

# specified, the entire match is returned. No more than 3 may be included.

987

42,

988

],

989

"pattern": "A String", # Pattern defining the regular expression. Its syntax

990

# (https://github.com/google/re2/wiki/Syntax) can be found under the

991

# google/re2 repository on GitHub.

992

},

993

},

994

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

995

# `InspectionRuleSet` are removed from results.

996

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

997

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

998

# be used to match sensitive information specific to the data, such as a list

999

# of employee IDs or job titles.

1000

#

1001

# Dictionary words are case-insensitive and all characters other than letters

1002

# and digits in the unicode [Basic Multilingual

1003

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

1004

# will be replaced with whitespace when scanning for matches, so the

1005

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

1006

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

1007

# surrounding any match must be of a different type than the adjacent

1008

# characters within the word, so letters must be next to non-letters and

1009

# digits next to non-digits. For example, the dictionary word "jen" will

1010

# match the first three letters of the text "jen123" but will return no

1011

# matches for "jennifer".

1012

#

1013

# Dictionary words containing a large number of characters that are not

1014

# letters or digits may result in unexpected findings because such characters

1015

# are treated as whitespace. The

1016

# [limits](https://cloud.google.com/dlp/limits) page contains details about

1017

# the size limits of dictionaries. For dictionaries that do not fit within

1018

# these constraints, consider using `LargeCustomDictionaryConfig` in the

1019

# `StoredInfoType` API.

1020

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

1021

# is accepted.

1022

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

1023

# Example: gs://[BUCKET_NAME]/dictionary.txt

1024

},

1025

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

1026

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

1027

# at least one phrase and every phrase must contain at least 2 characters

1028

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

1034

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

1035

# contained within with a finding of an infoType from this list. For

1036

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

1037

# `exclusion_rule` containing `exclude_info_types.info_types` with

1038

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

1039

# with EMAIL_ADDRESS finding.

1040

# That leads to "555-222-2222@example.org" to generate only a single

1041

# finding, namely email address.

1042

{ # Type of information detected by the API.

1043

"name": "A String", # Name of the information type. Either a name of your choosing when

1044

# creating a CustomInfoType, or one of the names listed

1045

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1046

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

1047

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

1052

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1053

# specified, the entire match is returned. No more than 3 may be included.

1054

42,

1055

],

1056

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1057

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1058

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

1066

# If empty, text, images, and other content will be included.

1067

"A String",

1068

],

1069

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

1070

# InfoType values returned by ListInfoTypes or listed at

1071

# https://cloud.google.com/dlp/docs/infotypes-reference.

1072

#

1073

# When no InfoTypes or CustomInfoTypes are specified in a request, the

1074

# system may automatically choose what detectors to run. By default this may

1075

# be all types, but may change over time as detectors are updated.

1076

#

1077

# If you need precise control and predictability as to what detectors are

1078

# run you should specify specific InfoTypes listed in the reference,

1079

# otherwise a default list will be used, which may change over time.

1080

{ # Type of information detected by the API.

1081

"name": "A String", # Name of the information type. Either a name of your choosing when

1082

# creating a CustomInfoType, or one of the names listed

1083

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1084

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

1085

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

1086

},

1087

],

1088

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1089

},

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1090

}

1091

1092

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

1099

1100

{ # Combines all of the information about a DLP job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1101

"errors": [ # A stream of errors encountered running the job.

1102

{ # Details information about an error encountered during job execution or

1103

# the results of an unsuccessful activation of the JobTrigger.

1104

"timestamps": [ # The times the error occurred.

1105

"A String",

1106

],

1107

"details": { # The `Status` type defines a logical error model that is suitable for # Detailed error codes and messages.

1108

# different programming environments, including REST APIs and RPC APIs. It is

1109

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

1110

# three pieces of data: error code, error message, and error details.

1111

#

1112

# You can find out more about this error model and how to work with it in the

1113

# [API Design Guide](https://cloud.google.com/apis/design/errors).

1114

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

1115

"details": [ # A list of messages that carry the error details. There is a common set of

1116

# message types for APIs to use.

1117

{

1118

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1119

},

1120

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1121

"message": "A String", # A developer-facing error message, which should be in English. Any

1122

# user-facing error message should be localized and sent in the

1123

# google.rpc.Status.details field, or localized by the client.

},

},

],

"createTime": "A String", # Time when the job was created.

1128

"state": "A String", # State of a job.

1129

"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.

1130

"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an # K-map result

1131

# estimation, not exact values.

1132

"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value

1133

# doesn't correspond to any such interval, the associated frequency is

1134

# zero. For example, the following records:

1135

# {min_anonymity: 1, max_anonymity: 1, frequency: 17}

1136

# {min_anonymity: 2, max_anonymity: 3, frequency: 42}

1137

# {min_anonymity: 5, max_anonymity: 10, frequency: 99}

1138

# mean that there are no record with an estimated anonymity of 4, 5, or

1139

# larger than 10.

1140

{ # A KMapEstimationHistogramBucket message with the following values:

# min_anonymity: 3

# max_anonymity: 5

# frequency: 42

# means that there are 42 records whose quasi-identifier values correspond

1145

# to 3, 4 or 5 people in the overlying population. An important particular

1146

# case is when min_anonymity = max_anonymity = 1: the frequency field then

1147

# corresponds to the number of uniquely identifiable records.

1148

"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.

1149

"bucketSize": "A String", # Number of records within these anonymity bounds.

1150

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

1151

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

1152

# number of classes returned per bucket is capped at 20.

1153

{ # A tuple of values for the quasi-identifier columns.

1154

"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.

1155

"quasiIdsValues": [ # The quasi-identifier values.

1156

{ # Set of primitive values supported by the system.

1157

# Note that for the purposes of inspection or transformation, the number

1158

# of bytes considered to comprise a 'Value' is based on its representation

1159

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1160

# 123456789, the number of bytes would be counted as 9, even though an

1161

# int64 only holds up to 8 bytes of data.

1162

"integerValue": "A String", # integer

1163

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1164

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1165

# types are google.type.Date and `google.protobuf.Timestamp`.

1166

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1167

# allow the value 60 if it allows leap-seconds.

1168

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1169

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1170

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1171

# to allow the value "24:00:00" for scenarios like business closing time.

1172

},

1173

"dayOfWeekValue": "A String", # day of week

1174

"floatValue": 3.14, # float

1175

"stringValue": "A String", # string

1176

"timestampValue": "A String", # timestamp

1177

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1178

# and time zone are either specified elsewhere or are not significant. The date

1179

# is relative to the Proleptic Gregorian Calendar. This can represent:

1180

#

1181

# * A full date, with non-zero year, month and day values

1182

# * A month and day value, with a zero year, e.g. an anniversary

1183

# * A year on its own, with zero month and day values

1184

# * A year and month value, with a zero day, e.g. a credit card expiration date

1185

#

1186

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1187

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1188

# month and day.

1189

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1190

# a year.

1191

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1192

# if specifying a year by itself or a year and month where the day is not

1193

# significant.

1194

},

1195

"booleanValue": True or False, # boolean

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

1196

},

1197

],

1198

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1199

],

1200

"minAnonymity": "A String", # Always positive.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1201

},

1202

],

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

1203

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1204

"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an # Delta-presence result

1205

# estimation, not exact values.

1206

"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a

1207

# value doesn't correspond to any such interval, the associated frequency

1208

# is zero. For example, the following records:

1209

# {min_probability: 0, max_probability: 0.1, frequency: 17}

1210

# {min_probability: 0.2, max_probability: 0.3, frequency: 42}

1211

# {min_probability: 0.3, max_probability: 0.4, frequency: 99}

1212

# mean that there are no record with an estimated probability in [0.1, 0.2)

1213

# nor larger or equal to 0.4.

1214

{ # A DeltaPresenceEstimationHistogramBucket message with the following

1215

# values:

1216

# min_probability: 0.1

1217

# max_probability: 0.2

1218

# frequency: 42

1219

# means that there are 42 records for which δ is in [0.1, 0.2). An

1220

# important particular case is when min_probability = max_probability = 1:

1221

# then, every individual who shares this quasi-identifier combination is in

1222

# the dataset.

1223

"maxProbability": 3.14, # Always greater than or equal to min_probability.

1224

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

1225

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

1226

# number of classes returned per bucket is capped at 20.

1227

{ # A tuple of values for the quasi-identifier columns.

1228

"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these

1229

# quasi-identifier values is in the dataset. This value, typically called

1230

# δ, is the ratio between the number of records in the dataset with these

1231

# quasi-identifier values, and the total number of individuals (inside

1232

# *and* outside the dataset) with these quasi-identifier values.

1233

# For example, if there are 15 individuals in the dataset who share the

1234

# same quasi-identifier values, and an estimated 100 people in the entire

1235

# population with these values, then δ is 0.15.

1236

"quasiIdsValues": [ # The quasi-identifier values.

1237

{ # Set of primitive values supported by the system.

1238

# Note that for the purposes of inspection or transformation, the number

1239

# of bytes considered to comprise a 'Value' is based on its representation

1240

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1241

# 123456789, the number of bytes would be counted as 9, even though an

1242

# int64 only holds up to 8 bytes of data.

1243

"integerValue": "A String", # integer

1244

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1245

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1246

# types are google.type.Date and `google.protobuf.Timestamp`.

1247

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1248

# allow the value 60 if it allows leap-seconds.

1249

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1250

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1251

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1252

# to allow the value "24:00:00" for scenarios like business closing time.

1253

},

1254

"dayOfWeekValue": "A String", # day of week

1255

"floatValue": 3.14, # float

1256

"stringValue": "A String", # string

1257

"timestampValue": "A String", # timestamp

1258

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1259

# and time zone are either specified elsewhere or are not significant. The date

1260

# is relative to the Proleptic Gregorian Calendar. This can represent:

1261

#

1262

# * A full date, with non-zero year, month and day values

1263

# * A month and day value, with a zero year, e.g. an anniversary

1264

# * A year on its own, with zero month and day values

1265

# * A year and month value, with a zero day, e.g. a credit card expiration date

1266

#

1267

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1268

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1269

# month and day.

1270

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1271

# a year.

1272

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1273

# if specifying a year by itself or a year and month where the day is not

1274

# significant.

1275

},

1276

"booleanValue": True or False, # boolean

},

],

},

],

"minProbability": 3.14, # Between 0 and 1.

1282

"bucketSize": "A String", # Number of records within these probability bounds.

},

],

},

"categoricalStatsResult": { # Result of the categorical stats computation. # Categorical stats result

1287

"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.

1288

{ # Histogram of value frequencies in the column.

1289

"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.

1290

"bucketValueCount": "A String", # Total number of distinct values in this bucket.

1291

"bucketSize": "A String", # Total number of values in this bucket.

1292

"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.

1293

"bucketValues": [ # Sample of value frequencies in this bucket. The total number of

1294

# values returned per bucket is capped at 20.

1295

{ # A value of a field, including its frequency.

1296

"count": "A String", # How many times the value is contained in the field.

1297

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

1298

# Note that for the purposes of inspection or transformation, the number

1299

# of bytes considered to comprise a 'Value' is based on its representation

1300

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1301

# 123456789, the number of bytes would be counted as 9, even though an

1302

# int64 only holds up to 8 bytes of data.

1303

"integerValue": "A String", # integer

1304

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1305

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1306

# types are google.type.Date and `google.protobuf.Timestamp`.

1307

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1308

# allow the value 60 if it allows leap-seconds.

1309

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1310

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1311

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1312

# to allow the value "24:00:00" for scenarios like business closing time.

1313

},

1314

"dayOfWeekValue": "A String", # day of week

1315

"floatValue": 3.14, # float

1316

"stringValue": "A String", # string

1317

"timestampValue": "A String", # timestamp

1318

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1319

# and time zone are either specified elsewhere or are not significant. The date

1320

# is relative to the Proleptic Gregorian Calendar. This can represent:

1321

#

1322

# * A full date, with non-zero year, month and day values

1323

# * A month and day value, with a zero year, e.g. an anniversary

1324

# * A year on its own, with zero month and day values

1325

# * A year and month value, with a zero day, e.g. a credit card expiration date

1326

#

1327

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1328

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1329

# month and day.

1330

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1331

# a year.

1332

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1333

# if specifying a year by itself or a year and month where the day is not

1334

# significant.

1335

},

1336

"booleanValue": True or False, # boolean

},

},

],

},

],

},

"numericalStatsResult": { # Result of the numerical stats computation. # Numerical stats result

1344

"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal

1345

# sized buckets.

1346

{ # Set of primitive values supported by the system.

1347

# Note that for the purposes of inspection or transformation, the number

1348

# of bytes considered to comprise a 'Value' is based on its representation

1349

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1350

# 123456789, the number of bytes would be counted as 9, even though an

1351

# int64 only holds up to 8 bytes of data.

1352

"integerValue": "A String", # integer

1353

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1354

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1355

# types are google.type.Date and `google.protobuf.Timestamp`.

1356

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1357

# allow the value 60 if it allows leap-seconds.

1358

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1359

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1360

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1361

# to allow the value "24:00:00" for scenarios like business closing time.

1362

},

1363

"dayOfWeekValue": "A String", # day of week

1364

"floatValue": 3.14, # float

1365

"stringValue": "A String", # string

1366

"timestampValue": "A String", # timestamp

1367

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1368

# and time zone are either specified elsewhere or are not significant. The date

1369

# is relative to the Proleptic Gregorian Calendar. This can represent:

1370

#

1371

# * A full date, with non-zero year, month and day values

1372

# * A month and day value, with a zero year, e.g. an anniversary

1373

# * A year on its own, with zero month and day values

1374

# * A year and month value, with a zero day, e.g. a credit card expiration date

1375

#

1376

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1377

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1378

# month and day.

1379

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1380

# a year.

1381

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1382

# if specifying a year by itself or a year and month where the day is not

1383

# significant.

1384

},

1385

"booleanValue": True or False, # boolean

1386

},

1387

],

1388

"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.

1389

# Note that for the purposes of inspection or transformation, the number

1390

# of bytes considered to comprise a 'Value' is based on its representation

1391

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1392

# 123456789, the number of bytes would be counted as 9, even though an

1393

# int64 only holds up to 8 bytes of data.

1394

"integerValue": "A String", # integer

1395

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1396

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1397

# types are google.type.Date and `google.protobuf.Timestamp`.

1398

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1399

# allow the value 60 if it allows leap-seconds.

1400

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1401

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1402

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1403

# to allow the value "24:00:00" for scenarios like business closing time.

1404

},

1405

"dayOfWeekValue": "A String", # day of week

1406

"floatValue": 3.14, # float

1407

"stringValue": "A String", # string

1408

"timestampValue": "A String", # timestamp

1409

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1410

# and time zone are either specified elsewhere or are not significant. The date

1411

# is relative to the Proleptic Gregorian Calendar. This can represent:

1412

#

1413

# * A full date, with non-zero year, month and day values

1414

# * A month and day value, with a zero year, e.g. an anniversary

1415

# * A year on its own, with zero month and day values

1416

# * A year and month value, with a zero day, e.g. a credit card expiration date

1417

#

1418

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1419

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1420

# month and day.

1421

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1422

# a year.

1423

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1424

# if specifying a year by itself or a year and month where the day is not

1425

# significant.

1426

},

1427

"booleanValue": True or False, # boolean

1428

},

1429

"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.

1430

# Note that for the purposes of inspection or transformation, the number

1431

# of bytes considered to comprise a 'Value' is based on its representation

1432

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1433

# 123456789, the number of bytes would be counted as 9, even though an

1434

# int64 only holds up to 8 bytes of data.

1435

"integerValue": "A String", # integer

1436

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1437

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1438

# types are google.type.Date and `google.protobuf.Timestamp`.

1439

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1440

# allow the value 60 if it allows leap-seconds.

1441

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1442

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1443

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1444

# to allow the value "24:00:00" for scenarios like business closing time.

1445

},

1446

"dayOfWeekValue": "A String", # day of week

1447

"floatValue": 3.14, # float

1448

"stringValue": "A String", # string

1449

"timestampValue": "A String", # timestamp

1450

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1451

# and time zone are either specified elsewhere or are not significant. The date

1452

# is relative to the Proleptic Gregorian Calendar. This can represent:

1453

#

1454

# * A full date, with non-zero year, month and day values

1455

# * A month and day value, with a zero year, e.g. an anniversary

1456

# * A year on its own, with zero month and day values

1457

# * A year and month value, with a zero day, e.g. a credit card expiration date

1458

#

1459

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1460

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1461

# month and day.

1462

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1463

# a year.

1464

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1465

# if specifying a year by itself or a year and month where the day is not

1466

# significant.

1467

},

1468

"booleanValue": True or False, # boolean

1469

},

1470

},

1471

"kAnonymityResult": { # Result of the k-anonymity computation. # K-anonymity result

1472

"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.

1473

{ # Histogram of k-anonymity equivalence classes.

1474

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

1475

# classes returned per bucket is capped at 20.

1476

{ # The set of columns' values that share the same ldiversity value

1477

"quasiIdsValues": [ # Set of values defining the equivalence class. One value per

1478

# quasi-identifier column in the original KAnonymity metric message.

1479

# The order is always the same as the original request.

1480

{ # Set of primitive values supported by the system.

1481

# Note that for the purposes of inspection or transformation, the number

1482

# of bytes considered to comprise a 'Value' is based on its representation

1483

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1484

# 123456789, the number of bytes would be counted as 9, even though an

1485

# int64 only holds up to 8 bytes of data.

1486

"integerValue": "A String", # integer

1487

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1488

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1489

# types are google.type.Date and `google.protobuf.Timestamp`.

1490

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1491

# allow the value 60 if it allows leap-seconds.

1492

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1493

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1494

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1495

# to allow the value "24:00:00" for scenarios like business closing time.

1496

},

1497

"dayOfWeekValue": "A String", # day of week

1498

"floatValue": 3.14, # float

1499

"stringValue": "A String", # string

1500

"timestampValue": "A String", # timestamp

1501

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1502

# and time zone are either specified elsewhere or are not significant. The date

1503

# is relative to the Proleptic Gregorian Calendar. This can represent:

1504

#

1505

# * A full date, with non-zero year, month and day values

1506

# * A month and day value, with a zero year, e.g. an anniversary

1507

# * A year on its own, with zero month and day values

1508

# * A year and month value, with a zero day, e.g. a credit card expiration date

1509

#

1510

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1511

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1512

# month and day.

1513

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1514

# a year.

1515

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1516

# if specifying a year by itself or a year and month where the day is not

1517

# significant.

1518

},

1519

"booleanValue": True or False, # boolean

1520

},

1521

],

1522

"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the

1523

# above set of values.

1524

},

1525

],

1526

"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.

1527

"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.

1528

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

1529

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

1530

},

1531

],

1532

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1533

"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1534

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including # Categorical stats

1535

# number of distinct values and value count distribution.

1536

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

1537

# supported except for arrays and structs. However, it may be more

1538

# informative to use NumericalStats when the field type is supported,

1539

# depending on the data.

1540

"name": "A String", # Name describing the field.

1541

},

1542

},

1543

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk. # l-diversity

1544

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

1545

"name": "A String", # Name describing the field.

1546

},

1547

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

1548

# defined for the l-diversity computation. When multiple fields are

1549

# specified, they are considered a single composite key.

1550

{ # General identifier of a data field in a storage service.

1551

"name": "A String", # Name describing the field.

},

],

},

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what # k-map

1556

# is called "journalist risk" in the literature, except the attack dataset is

1557

# statistically modeled instead of being perfectly known. This can be done

1558

# using publicly available data (like the US Census), or using a custom

1559

# statistical model (indicated as one or several BigQuery tables), or by

1560

# extrapolating from the distribution of values in the input dataset.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1561

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

1562

# Set if no column is tagged with a region-specific InfoType (like

1563

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1564

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two columns can have the

1565

# same tag.

1566

{ # A column with a semantic tag attached.

1567

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

1568

"name": "A String", # Name describing the field.

1569

},

1570

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

1571

# indicate an auxiliary table that contains statistical information on

1572

# the possible values of this column (below).

1573

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

1574

# dataset as a statistical model of population, if available. We

1575

# currently support US ZIP codes, region codes, ages and genders.

1576

# To programmatically obtain the list of supported InfoTypes, use

1577

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

1578

"name": "A String", # Name of the information type. Either a name of your choosing when

1579

# creating a CustomInfoType, or one of the names listed

1580

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1581

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

1582

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

1583

},

1584

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

1585

# the distribution of values in the input data

1586

# empty messages in your APIs. A typical example is to use it as the request

1587

# or the response type of an API method. For instance:

1588

#

1589

# service Foo {

1590

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

1591

# }

1592

#

1593

# The JSON representation for `Empty` is empty JSON object `{}`.

1594

},

1595

},

1596

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1597

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1598

# used to tag a quasi-identifiers column must appear in exactly one column

1599

# of one auxiliary table.

1600

{ # An auxiliary table contains statistical information on the relative

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1601

# frequency of different quasi-identifiers values. It has one or several

1602

# quasi-identifiers columns, and one column that indicates the relative

1603

# frequency of each quasi-identifier tuple.

1604

# If a tuple is present in the data but not in the auxiliary table, the

1605

# corresponding relative frequency is assumed to be zero (and thus, the

1606

# tuple is highly reidentifiable).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1607

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

1608

# identified by its project_id, dataset_id, and table_name. Within a query

1609

# a table is often referenced with a string in the format of:

1610

# `<project_id>:<dataset_id>.<table_id>` or

1611

# `<project_id>.<dataset_id>.<table_id>`.

1612

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

1613

# If omitted, project ID is inferred from the API call.

1614

"datasetId": "A String", # Dataset ID of the table.

1615

"tableId": "A String", # Name of the table.

1616

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1617

"quasiIds": [ # Required. Quasi-identifier columns.

1618

{ # A quasi-identifier column has a custom_tag, used to know which column

1619

# in the data corresponds to which column in the statistical model.

1620

"customTag": "A String", # A auxiliary field.

1621

"field": { # General identifier of a data field in a storage service. # Identifies the column.

1622

"name": "A String", # Name describing the field.

1623

},

1624

},

1625

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1626

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

1627

# between 0 and 1 (inclusive). Null values are assumed to be zero.

1628

"name": "A String", # Name describing the field.

1629

},

1630

},

1631

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1632

},

1633

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to # delta-presence

1634

# figure out that one given individual appears in a de-identified dataset.

1635

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

1636

# knowing the attack dataset, so we use a statistical model instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1637

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two fields can have the

1638

# same tag.

1639

{ # A column with a semantic tag attached.

1640

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

1641

"name": "A String", # Name describing the field.

1642

},

1643

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

1644

# dataset as a statistical model of population, if available. We

1645

# currently support US ZIP codes, region codes, ages and genders.

1646

# To programmatically obtain the list of supported InfoTypes, use

1647

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

1648

"name": "A String", # Name of the information type. Either a name of your choosing when

1649

# creating a CustomInfoType, or one of the names listed

1650

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1651

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

1652

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1653

},

1654

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

1655

# indicate an auxiliary table that contains statistical information on

1656

# the possible values of this column (below).

1657

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

1658

# the distribution of values in the input data

1659

# empty messages in your APIs. A typical example is to use it as the request

1660

# or the response type of an API method. For instance:

1661

#

1662

# service Foo {

1663

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

1664

# }

1665

#

1666

# The JSON representation for `Empty` is empty JSON object `{}`.

1667

},

1668

},

1669

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1670

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

1671

# used to tag a quasi-identifiers field must appear in exactly one

1672

# field of one auxiliary table.

1673

{ # An auxiliary table containing statistical information on the relative

1674

# frequency of different quasi-identifiers values. It has one or several

1675

# quasi-identifiers columns, and one column that indicates the relative

1676

# frequency of each quasi-identifier tuple.

1677

# If a tuple is present in the data but not in the auxiliary table, the

1678

# corresponding relative frequency is assumed to be zero (and thus, the

1679

# tuple is highly reidentifiable).

1680

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

1681

# between 0 and 1 (inclusive). Null values are assumed to be zero.

1682

"name": "A String", # Name describing the field.

1683

},

1684

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

1685

# identified by its project_id, dataset_id, and table_name. Within a query

1686

# a table is often referenced with a string in the format of:

1687

# `<project_id>:<dataset_id>.<table_id>` or

1688

# `<project_id>.<dataset_id>.<table_id>`.

1689

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

1690

# If omitted, project ID is inferred from the API call.

1691

"datasetId": "A String", # Dataset ID of the table.

1692

"tableId": "A String", # Name of the table.

1693

},

1694

"quasiIds": [ # Required. Quasi-identifier columns.

1695

{ # A quasi-identifier column has a custom_tag, used to know which column

1696

# in the data corresponds to which column in the statistical model.

1697

"field": { # General identifier of a data field in a storage service. # Identifies the column.

1698

"name": "A String", # Name describing the field.

1699

},

1700

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

1701

# indicate an auxiliary table that contains statistical information on

1702

# the possible values of this column (below).

},

],

},

],

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

1708

# Set if no column is tagged with a region-specific InfoType (like

1709

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1710

},

1711

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk. # K-anonymity

1712

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Message indicating that multiple rows might be associated to a

1713

# single individual. If the same entity_id is associated to multiple

1714

# quasi-identifier tuples over distinct rows, we consider the entire

1715

# collection of tuples as the composite quasi-identifier. This collection

1716

# is a multiset: the order in which the different tuples appear in the

1717

# dataset is ignored, but their frequency is taken into account.

1718

#

1719

# Important note: a maximum of 1000 rows can be associated to a single

1720

# entity ID. If more rows are associated with the same entity ID, some

1721

# might be ignored.

1722

# single person. For example, in medical records the `EntityId` might be a

1723

# patient identifier, or for financial records it might be an account

1724

# identifier. This message is used when generalizations or analysis must take

1725

# into account that multiple rows correspond to the same entity.

1726

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

1727

"name": "A String", # Name describing the field.

1728

},

1729

},

1730

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

1731

# specified, they are considered a single composite key. Structs and

1732

# repeated data types are not supported; however, nested fields are

1733

# supported so long as they are not structs themselves or nested within

1734

# a repeated field.

1735

{ # General identifier of a data field in a storage service.

1736

"name": "A String", # Name describing the field.

},

],

},

"numericalStatsConfig": { # Compute numerical stats over an individual column, including # Numerical stats

1741

# min, max, and quantiles.

1742

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

1743

# integer, float, date, datetime, timestamp, time.

1744

"name": "A String", # Name describing the field.

1745

},

1746

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1747

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1748

"lDiversityResult": { # Result of the l-diversity computation. # L-divesity result

1749

"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.

1750

{ # Histogram of l-diversity equivalence class sensitive value frequencies.

1751

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1752

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

1753

# classes returned per bucket is capped at 20.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1754

{ # The set of columns' values that share the same ldiversity value.

1755

"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence

1756

# class. The order is always the same as the original request.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1757

{ # Set of primitive values supported by the system.

1758

# Note that for the purposes of inspection or transformation, the number

1759

# of bytes considered to comprise a 'Value' is based on its representation

1760

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1761

# 123456789, the number of bytes would be counted as 9, even though an

1762

# int64 only holds up to 8 bytes of data.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1763

"integerValue": "A String", # integer

1764

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1765

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1766

# types are google.type.Date and `google.protobuf.Timestamp`.

1767

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1768

# allow the value 60 if it allows leap-seconds.

1769

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1770

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1771

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1772

# to allow the value "24:00:00" for scenarios like business closing time.

1773

},

1774

"dayOfWeekValue": "A String", # day of week

1775

"floatValue": 3.14, # float

1776

"stringValue": "A String", # string

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1777

"timestampValue": "A String", # timestamp

1778

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1779

# and time zone are either specified elsewhere or are not significant. The date

1780

# is relative to the Proleptic Gregorian Calendar. This can represent:

1781

#

1782

# * A full date, with non-zero year, month and day values

1783

# * A month and day value, with a zero year, e.g. an anniversary

1784

# * A year on its own, with zero month and day values

1785

# * A year and month value, with a zero day, e.g. a credit card expiration date

1786

#

1787

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1788

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1789

# month and day.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1790

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1791

# a year.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1792

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1793

# if specifying a year by itself or a year and month where the day is not

1794

# significant.

1795

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1796

"booleanValue": True or False, # boolean

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1797

},

1798

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1799

"topSensitiveValues": [ # Estimated frequencies of top sensitive values.

1800

{ # A value of a field, including its frequency.

1801

"count": "A String", # How many times the value is contained in the field.

1802

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

1803

# Note that for the purposes of inspection or transformation, the number

1804

# of bytes considered to comprise a 'Value' is based on its representation

1805

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1806

# 123456789, the number of bytes would be counted as 9, even though an

1807

# int64 only holds up to 8 bytes of data.

1808

"integerValue": "A String", # integer

1809

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

1810

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1811

# types are google.type.Date and `google.protobuf.Timestamp`.

1812

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1813

# allow the value 60 if it allows leap-seconds.

1814

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1815

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1816

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1817

# to allow the value "24:00:00" for scenarios like business closing time.

1818

},

1819

"dayOfWeekValue": "A String", # day of week

1820

"floatValue": 3.14, # float

1821

"stringValue": "A String", # string

1822

"timestampValue": "A String", # timestamp

1823

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

1824

# and time zone are either specified elsewhere or are not significant. The date

1825

# is relative to the Proleptic Gregorian Calendar. This can represent:

1826

#

1827

# * A full date, with non-zero year, month and day values

1828

# * A month and day value, with a zero year, e.g. an anniversary

1829

# * A year on its own, with zero month and day values

1830

# * A year and month value, with a zero day, e.g. a credit card expiration date

1831

#

1832

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1833

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1834

# month and day.

1835

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1836

# a year.

1837

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1838

# if specifying a year by itself or a year and month where the day is not

1839

# significant.

1840

},

1841

"booleanValue": True or False, # boolean

},

},

],

"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.

1846

"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1847

},

1848

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1849

"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence

1850

# classes in this bucket.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1851

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1852

"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence

1853

# classes in this bucket.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

},

],

},

"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

1858

# identified by its project_id, dataset_id, and table_name. Within a query

1859

# a table is often referenced with a string in the format of:

1860

# `<project_id>:<dataset_id>.<table_id>` or

1861

# `<project_id>.<dataset_id>.<table_id>`.

1862

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

1863

# If omitted, project ID is inferred from the API call.

1864

"datasetId": "A String", # Dataset ID of the table.

1865

"tableId": "A String", # Name of the table.

1866

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1867

},

1868

"type": "A String", # The type of job.

1869

"endTime": "A String", # Time when the job finished.

1870

"startTime": "A String", # Time when the job started.

1871

"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that

1872

# instantiated the job.

1873

"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.

1874

"requestedOptions": { # Snapshot of the inspection configuration. # The configuration used for this job.

1875

"jobConfig": { # Controls what and how to inspect for findings. # Inspect config.

1876

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

1877

# `inspect_config` will be merged into the values persisted as part of the

1878

# template.

1879

"actions": [ # Actions to execute at the completion of the job.

1880

{ # A task to execute on the completion of a job.

1881

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

1882

"publishToStackdriver": { # Enable Stackdriver metric dlp.googleapis.com/finding_count. This # Enable Stackdriver metric dlp.googleapis.com/finding_count.

1883

# will publish a metric to stack driver on each infotype requested and

1884

# how many findings were found for it. CustomDetectors will be bucketed

1885

# as 'Custom' under the Stackdriver label 'info_type'.

1886

},

1887

"publishFindingsToCloudDataCatalog": { # Publish findings of a DlpJob to Cloud Data Catalog. Labels summarizing the # Publish findings to Cloud Datahub.

1888

# results of the DlpJob will be applied to the entry for the resource scanned

1889

# in Cloud Data Catalog. Any labels previously written by another DlpJob will

1890

# be deleted. InfoType naming patterns are strictly enforced when using this

1891

# feature. Note that the findings will be persisted in Cloud Data Catalog

1892

# storage and are governed by Data Catalog service-specific policy, see

1893

# https://cloud.google.com/terms/service-terms

1894

# Only a single instance of this action can be specified and only allowed if

1895

# all resources being scanned are BigQuery tables.

1896

# Compatible with: Inspect

1897

},

1898

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification for project owners and editors on job's

1899

# completion/failure.

1900

# completion/failure.

1901

},

1902

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

1903

# message contains a single field, `DlpJobName`, which is equal to the

1904

# finished job's

1905

# [`DlpJob.name`](https://cloud.google.com/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

1906

# Compatible with: Inspect, Risk

1907

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

1908

# publishing access rights to the DLP API service account executing

1909

# the long running DlpJob sending the notifications.

1910

# Format is projects/{project}/topics/{topic}.

1911

},

1912

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

1913

# OutputStorageConfig. Only a single instance of this action can be

1914

# specified.

1915

# Compatible with: Inspect, Risk

1916

"outputConfig": { # Cloud repository for storing output. # Location to store findings outside of DLP.

1917

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

1918

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

1919

# from the `Finding` object. If appending to an existing table, any columns

1920

# from the predefined schema that are missing will be added. No columns in

1921

# the existing table will be deleted.

1922

#

1923

# If unspecified, then all available columns will be used for a new table or

1924

# an (existing) table with no schema, and no changes will be made to an

1925

# existing table that has a schema.

1926

# Only for use with external storage.

1927

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

1928

# dataset. If table_id is not set a new one will be generated

1929

# for you with the following format:

1930

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

1931

# generating the date details.

1932

#

1933

# For Inspect, each column in an existing output table must have the same

1934

# name, type, and mode of a field in the `Finding` object.

1935

#

1936

# For Risk, an existing output table should be the output of a previous

1937

# Risk analysis job run on the same source table, with the same privacy

1938

# metric and quasi-identifiers. Risk jobs that analyze the same table but

1939

# compute a different privacy metric, or use different sets of

1940

# quasi-identifiers, cannot store their results in the same table.

1941

# identified by its project_id, dataset_id, and table_name. Within a query

1942

# a table is often referenced with a string in the format of:

1943

# `<project_id>:<dataset_id>.<table_id>` or

1944

# `<project_id>.<dataset_id>.<table_id>`.

1945

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

1946

# If omitted, project ID is inferred from the API call.

1947

"datasetId": "A String", # Dataset ID of the table.

1948

"tableId": "A String", # Name of the table.

},

},

},

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

1953

# Command Center (CSCC Alpha).

1954

# This action is only available for projects which are parts of

1955

# an organization and whitelisted for the alpha Cloud Security Command

1956

# Center.

1957

# The action will publish count of finding instances and their info types.

1958

# The summary of findings will be persisted in CSCC and are governed by CSCC

1959

# service-specific policy, see https://cloud.google.com/terms/service-terms

1960

# Only a single instance of this action can be specified.

1961

# Compatible with: Inspect

1962

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

1963

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

1964

],

1965

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

1966

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options.

1967

# bucket.

1968

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

1969

# number of bytes scanned is rounded down. Must be between 0 and 100,

1970

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

1971

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

1972

"fileTypes": [ # List of file type groups to include in the scan.

1973

# If empty, all files are scanned and available data format processors

1974

# are applied. In addition, the binary content of the selected files

1975

# is always scanned as well.

1976

# Images are scanned only as binary if the specified region

1977

# does not support image inspection and no file_types were specified.

1978

# Image inspection is restricted to 'global', 'us', 'asia', and 'europe'.

1979

"A String",

1980

],

1981

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

1982

# than this value then the rest of the bytes are omitted. Only one

1983

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

1984

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

1985

# Number of files scanned is rounded down. Must be between 0 and 100,

1986

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

1987

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

1988

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

1989

# `regex_file_set` must be set.

1990

# expressions are used to allow fine-grained control over which files in the

1991

# bucket to include.

1992

#

1993

# Included files are those that match at least one item in `include_regex` and

1994

# do not match any items in `exclude_regex`. Note that a file that matches

1995

# items from both lists will _not_ be included. For a match to occur, the

1996

# entire file path (i.e., everything in the url after the bucket name) must

1997

# match the regular expression.

1998

#

1999

# For example, given the input `{bucket_name: "mybucket", include_regex:

2000

# ["directory1/.*"], exclude_regex:

2001

# ["directory1/excluded.*"]}`:

2002

#

2003

# * `gs://mybucket/directory1/myfile` will be included

2004

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

2005

# across `/`)

2006

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

2007

# full path doesn't match any items in `include_regex`)

2008

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

2009

# matches an item in `exclude_regex`)

2010

#

2011

# If `include_regex` is left empty, it will match all files by default

2012

# (this is equivalent to setting `include_regex: [".*"]`).

2013

#

2014

# Some other common use cases:

2015

#

2016

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

2017

# files in `mybucket` except for .pdf files

2018

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

2019

# include all files directly under `gs://mybucket/directory/`, without matching

2020

# across `/`

2021

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

2022

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

2023

# the bucket that match at least one of these regular expressions will be

2024

# excluded from the scan.

2025

#

2026

# Regular expressions use RE2

2027

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

2028

# under the google/re2 repository on GitHub.

2029

"A String",

2030

],

2031

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

2032

# the bucket that match at least one of these regular expressions will be

2033

# included in the set of files, except for those that also match an item in

2034

# `exclude_regex`. Leaving this field empty will match all files by default

2035

# (this is equivalent to including `.*` in the list).

2036

#

2037

# Regular expressions use RE2

2038

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

2039

# under the google/re2 repository on GitHub.

"A String",

],

},

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

2044

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

2045

#

2046

# If the url ends in a trailing slash, the bucket or directory represented

2047

# by the url will be scanned non-recursively (content in sub-directories

2048

# will not be scanned). This means that `gs://mybucket/` is equivalent to

2049

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

2050

# `gs://mybucket/directory/*`.

2051

#

2052

# Exactly one of `url` or `regex_file_set` must be set.

2053

},

2054

"sampleMethod": "A String",

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2055

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2056

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options.

2057

"sampleMethod": "A String",

2058

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

2059

# identified by its project_id, dataset_id, and table_name. Within a query

2060

# a table is often referenced with a string in the format of:

2061

# `<project_id>:<dataset_id>.<table_id>` or

2062

# `<project_id>.<dataset_id>.<table_id>`.

2063

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

2064

# If omitted, project ID is inferred from the API call.

2065

"datasetId": "A String", # Dataset ID of the table.

2066

"tableId": "A String", # Name of the table.

2067

},

2068

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

2069

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

2070

# 100 means no limit. Defaults to 0. Only one of rows_limit and

2071

# rows_limit_percent can be specified. Cannot be used in conjunction with

2072

# TimespanConfig.

2073

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

2074

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

2075

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

2076

# Cannot be used in conjunction with TimespanConfig.

2077

"identifyingFields": [ # Table fields that may uniquely identify a row within the table. When

2078

# `actions.saveFindings.outputConfig.table` is specified, the values of

2079

# columns specified here are available in the output table under

2080

# `location.content_locations.record_location.record_key.id_values`. Nested

2081

# fields such as `person.birthdate.year` are allowed.

2082

{ # General identifier of a data field in a storage service.

2083

"name": "A String", # Name describing the field.

2084

},

2085

],

2086

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

2087

# inspection of entire columns which you know have no findings.

2088

{ # General identifier of a data field in a storage service.

2089

"name": "A String", # Name describing the field.

},

],

},

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

2094

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

2095

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

2096

# Used for data sources like Datastore and BigQuery.

2097

#

2098

# For BigQuery:

2099

# Required to filter out rows based on the given start and

2100

# end times. If not specified and the table was modified between the given

2101

# start and end times, the entire table will be scanned.

2102

# The valid data types of the timestamp field are: `INTEGER`, `DATE`,

2103

# `TIMESTAMP`, or `DATETIME` BigQuery column.

2104

#

2105

# For Datastore.

2106

# Valid data types of the timestamp field are: `TIMESTAMP`.

2107

# Datastore entity will be scanned if the timestamp property does not

2108

# exist or its value is empty or invalid.

2109

"name": "A String", # Name describing the field.

2110

},

2111

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

2112

# a valid start_time to avoid scanning files that have not been modified

2113

# since the last time the JobTrigger executed. This will be based on the

2114

# time of the execution of the last run of the JobTrigger.

2115

"startTime": "A String", # Exclude files or rows older than this value.

2116

"endTime": "A String", # Exclude files or rows newer than this value.

2117

# If set to zero, no upper time limit is applied.

2118

},

2119

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options.

2120

"kind": { # A representation of a Datastore kind. # The kind to process.

2121

"name": "A String", # The name of the kind.

2122

},

2123

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

2124

# by project and namespace, however the namespace ID may be empty.

2125

# A partition ID identifies a grouping of entities. The grouping is always

2126

# by project and namespace, however the namespace ID may be empty.

2127

#

2128

# A partition ID contains several dimensions:

2129

# project ID and namespace ID.

2130

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

2131

"projectId": "A String", # The ID of the project to which the entities belong.

2132

},

2133

},

2134

"hybridOptions": { # Configuration to control jobs where the content being inspected is outside # Hybrid inspection options.

2135

# Early access feature is in a pre-release state and might change or have

2136

# limited support. For more information, see

2137

# https://cloud.google.com/products#product-launch-stages.

2138

# of Google Cloud Platform.

2139

"tableOptions": { # Instructions regarding the table content being inspected. # If the container is a table, additional information to make findings

2140

# meaningful such as the columns that are primary keys.

2141

"identifyingFields": [ # The columns that are the primary keys for table objects included in

2142

# ContentItem. A copy of this cell's value will stored alongside alongside

2143

# each finding so that the finding can be traced to the specific row it came

2144

# from. No more than 3 may be provided.

2145

{ # General identifier of a data field in a storage service.

2146

"name": "A String", # Name describing the field.

},

],

},

"requiredFindingLabelKeys": [ # These are labels that each inspection request must include within their

2151

# 'finding_labels' map. Request may contain others, but any missing one of

2152

# these will be rejected.

2153

#

2154

# Label keys must be between 1 and 63 characters long and must conform

2155

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

2156

#

2157

# No more than 10 keys can be required.

2158

"A String",

2159

],

2160

"labels": { # To organize findings, these labels will be added to each finding.

2161

#

2162

# Label keys must be between 1 and 63 characters long and must conform

2163

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

2164

#

2165

# Label values must be between 0 and 63 characters long and must conform

2166

# to the regular expression `([a-z]([-a-z0-9]*[a-z0-9])?)?`.

2167

#

2168

# No more than 10 labels can be associated with a given finding.

2169

#

2170

# Examples:

2171

# * `"environment" : "production"`

2172

# * `"pipeline" : "etl"`

2173

"a_key": "A String",

2174

},

2175

"description": "A String", # A short description of where the data is coming from. Will be stored once

2176

# in the job. 256 max length.

2177

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2178

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2179

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

2180

# When used with redactContent only info_types and min_likelihood are currently

2181

# used.

2182

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

2183

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

2184

{ # Custom information type provided by the user. Used to find domain-specific

2185

# sensitive information configurable to the data in question.

2186

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

2187

# be used to match sensitive information specific to the data, such as a list

2188

# of employee IDs or job titles.

2189

#

2190

# Dictionary words are case-insensitive and all characters other than letters

2191

# and digits in the unicode [Basic Multilingual

2192

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

2193

# will be replaced with whitespace when scanning for matches, so the

2194

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

2195

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

2196

# surrounding any match must be of a different type than the adjacent

2197

# characters within the word, so letters must be next to non-letters and

2198

# digits next to non-digits. For example, the dictionary word "jen" will

2199

# match the first three letters of the text "jen123" but will return no

2200

# matches for "jennifer".

2201

#

2202

# Dictionary words containing a large number of characters that are not

2203

# letters or digits may result in unexpected findings because such characters

2204

# are treated as whitespace. The

2205

# [limits](https://cloud.google.com/dlp/limits) page contains details about

2206

# the size limits of dictionaries. For dictionaries that do not fit within

2207

# these constraints, consider using `LargeCustomDictionaryConfig` in the

2208

# `StoredInfoType` API.

2209

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

2210

# is accepted.

2211

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

2212

# Example: gs://[BUCKET_NAME]/dictionary.txt

2213

},

2214

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

2215

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

2216

# at least one phrase and every phrase must contain at least 2 characters

2217

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

2223

# infoType, when the name matches one of existing infoTypes and that infoType

2224

# is specified in `InspectContent.info_types` field. Specifying the latter

2225

# adds findings to the one detected by the system. If built-in info type is

2226

# not specified in `InspectContent.info_types` list then the name is treated

2227

# as a custom info type.

2228

"name": "A String", # Name of the information type. Either a name of your choosing when

2229

# creating a CustomInfoType, or one of the names listed

2230

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2231

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2232

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2233

},

2234

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

2235

# altered by a detection rule if the finding meets the criteria specified by

2236

# the rule. Defaults to `VERY_LIKELY` if not specified.

2237

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

2238

# Rules are applied in order that they are specified. Not supported for the

2239

# `surrogate_type` CustomInfoType.

2240

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

2241

# `CustomInfoType` to alter behavior under certain circumstances, depending

2242

# on the specific details of the rule. Not supported for the `surrogate_type`

2243

# custom infoType.

2244

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

2245

# proximity of hotwords.

2246

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

2247

# The total length of the window cannot exceed 1000 characters. Note that

2248

# the finding itself will be included in the window, so that hotwords may

2249

# be used to match substrings of the finding itself. For example, the

2250

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

2251

# adjusted upwards if the area code is known to be the local area code of

2252

# a company office using the hotword regex "$xxx$", where "xxx"

2253

# is the area code in question.

2254

# rule.

2255

"windowAfter": 42, # Number of characters after the finding to consider.

2256

"windowBefore": 42, # Number of characters before the finding to consider.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2257

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2258

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

2259

# part of a detection rule.

2260

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

2261

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

2262

# levels. For example, if a finding would be `POSSIBLE` without the

2263

# detection rule and `relative_likelihood` is 1, then it is upgraded to

2264

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

2265

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

2266

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

2267

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

2268

# a final likelihood of `LIKELY`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2269

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2270

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

2271

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2272

# specified, the entire match is returned. No more than 3 may be included.

2273

42,

2274

],

2275

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2276

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2277

# google/re2 repository on GitHub.

2278

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2279

},

2280

},

2281

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2282

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

2283

# support reversing.

2284

# such as

2285

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

2286

# These types of transformations are

2287

# those that perform pseudonymization, thereby producing a "surrogate" as

2288

# output. This should be used in conjunction with a field on the

2289

# transformation such as `surrogate_info_type`. This CustomInfoType does

2290

# not support the use of `detection_rules`.

2291

},

2292

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

2293

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2294

# specified, the entire match is returned. No more than 3 may be included.

2295

42,

2296

],

2297

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2298

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2299

# google/re2 repository on GitHub.

2300

},

2301

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

2302

# `InspectDataSource`. Not currently supported in `InspectContent`.

2303

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

2304

# `organizations/433245324/storedInfoTypes/432452342` or

2305

# `projects/project-id/storedInfoTypes/432452342`.

2306

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

2307

# inspection was created. Output-only field, populated by the system.

2308

},

2309

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

2310

# to be returned. It still can be used for rules matching.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2311

},

2312

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2313

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

2314

# POSSIBLE.

2315

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

2316

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

2317

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

2318

# When set within `InspectContentRequest`, the maximum returned is 2000

2319

# regardless if this is set higher.

2320

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

2321

{ # Max findings configuration per infoType, per content item or long

2322

# running DlpJob.

2323

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

2324

# info_type should be provided. If InfoTypeLimit does not have an

2325

# info_type, the DLP API applies the limit against all info_types that

2326

# are found but not specified in another InfoTypeLimit.

2327

"name": "A String", # Name of the information type. Either a name of your choosing when

2328

# creating a CustomInfoType, or one of the names listed

2329

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2330

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2331

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2332

},

2333

"maxFindings": 42, # Max findings limit for the given infoType.

2334

},

2335

],

2336

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

2337

# When set within `InspectJobConfig`,

2338

# the maximum returned is 2000 regardless if this is set higher.

2339

# When set within `InspectContentRequest`, this field is ignored.

2340

},

2341

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

2342

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

2343

# included in the response; see Finding.quote.

2344

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

2345

# Exclusion rules, contained in the set are executed in the end, other

2346

# rules are executed in the order they are specified for each info type.

2347

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

2348

# circumstances, depending on the specific details of the rules within the set.

2349

"infoTypes": [ # List of infoTypes this rule set is applied to.

2350

{ # Type of information detected by the API.

2351

"name": "A String", # Name of the information type. Either a name of your choosing when

2352

# creating a CustomInfoType, or one of the names listed

2353

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2354

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2355

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2356

},

2357

],

2358

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

2359

{ # A single inspection rule to be applied to infoTypes, specified in

2360

# `InspectionRuleSet`.

2361

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

2362

# proximity of hotwords.

2363

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

2364

# The total length of the window cannot exceed 1000 characters. Note that

2365

# the finding itself will be included in the window, so that hotwords may

2366

# be used to match substrings of the finding itself. For example, the

2367

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

2368

# adjusted upwards if the area code is known to be the local area code of

2369

# a company office using the hotword regex "$xxx$", where "xxx"

2370

# is the area code in question.

2371

# rule.

2372

"windowAfter": 42, # Number of characters after the finding to consider.

2373

"windowBefore": 42, # Number of characters before the finding to consider.

2374

},

2375

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

2376

# part of a detection rule.

2377

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

2378

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

2379

# levels. For example, if a finding would be `POSSIBLE` without the

2380

# detection rule and `relative_likelihood` is 1, then it is upgraded to

2381

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

2382

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

2383

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

2384

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

2385

# a final likelihood of `LIKELY`.

2386

},

2387

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

2388

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2389

# specified, the entire match is returned. No more than 3 may be included.

2390

42,

2391

],

2392

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2393

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2394

# google/re2 repository on GitHub.

2395

},

2396

},

2397

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

2398

# `InspectionRuleSet` are removed from results.

2399

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

2400

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

2401

# be used to match sensitive information specific to the data, such as a list

2402

# of employee IDs or job titles.

2403

#

2404

# Dictionary words are case-insensitive and all characters other than letters

2405

# and digits in the unicode [Basic Multilingual

2406

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

2407

# will be replaced with whitespace when scanning for matches, so the

2408

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

2409

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

2410

# surrounding any match must be of a different type than the adjacent

2411

# characters within the word, so letters must be next to non-letters and

2412

# digits next to non-digits. For example, the dictionary word "jen" will

2413

# match the first three letters of the text "jen123" but will return no

2414

# matches for "jennifer".

2415

#

2416

# Dictionary words containing a large number of characters that are not

2417

# letters or digits may result in unexpected findings because such characters

2418

# are treated as whitespace. The

2419

# [limits](https://cloud.google.com/dlp/limits) page contains details about

2420

# the size limits of dictionaries. For dictionaries that do not fit within

2421

# these constraints, consider using `LargeCustomDictionaryConfig` in the

2422

# `StoredInfoType` API.

2423

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

2424

# is accepted.

2425

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

2426

# Example: gs://[BUCKET_NAME]/dictionary.txt

2427

},

2428

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

2429

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

2430

# at least one phrase and every phrase must contain at least 2 characters

2431

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

2437

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

2438

# contained within with a finding of an infoType from this list. For

2439

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

2440

# `exclusion_rule` containing `exclude_info_types.info_types` with

2441

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

2442

# with EMAIL_ADDRESS finding.

2443

# That leads to "555-222-2222@example.org" to generate only a single

2444

# finding, namely email address.

2445

{ # Type of information detected by the API.

2446

"name": "A String", # Name of the information type. Either a name of your choosing when

2447

# creating a CustomInfoType, or one of the names listed

2448

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2449

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2450

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

2455

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2456

# specified, the entire match is returned. No more than 3 may be included.

2457

42,

2458

],

2459

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2460

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2461

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

2469

# If empty, text, images, and other content will be included.

2470

"A String",

2471

],

2472

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

2473

# InfoType values returned by ListInfoTypes or listed at

2474

# https://cloud.google.com/dlp/docs/infotypes-reference.

2475

#

2476

# When no InfoTypes or CustomInfoTypes are specified in a request, the

2477

# system may automatically choose what detectors to run. By default this may

2478

# be all types, but may change over time as detectors are updated.

2479

#

2480

# If you need precise control and predictability as to what detectors are

2481

# run you should specify specific InfoTypes listed in the reference,

2482

# otherwise a default list will be used, which may change over time.

2483

{ # Type of information detected by the API.

2484

"name": "A String", # Name of the information type. Either a name of your choosing when

2485

# creating a CustomInfoType, or one of the names listed

2486

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2487

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2488

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

},

"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of

2494

# this run.

2495

# to be detected) to be used anywhere you otherwise would normally specify

2496

# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates

2497

# to learn more.

2498

"description": "A String", # Short description (max 256 chars).

2499

"displayName": "A String", # Display name (max 256 chars).

2500

"createTime": "A String", # Output only. The creation timestamp of an inspectTemplate.

2501

"updateTime": "A String", # Output only. The last update timestamp of an inspectTemplate.

2502

"name": "A String", # Output only. The template name.

2503

#

2504

# The template will have one of the following formats:

2505

# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR

2506

# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`;

2507

"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.

2508

# When used with redactContent only info_types and min_likelihood are currently

2509

# used.

2510

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

2511

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

2512

{ # Custom information type provided by the user. Used to find domain-specific

2513

# sensitive information configurable to the data in question.

2514

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

2515

# be used to match sensitive information specific to the data, such as a list

2516

# of employee IDs or job titles.

2517

#

2518

# Dictionary words are case-insensitive and all characters other than letters

2519

# and digits in the unicode [Basic Multilingual

2520

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

2521

# will be replaced with whitespace when scanning for matches, so the

2522

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

2523

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

2524

# surrounding any match must be of a different type than the adjacent

2525

# characters within the word, so letters must be next to non-letters and

2526

# digits next to non-digits. For example, the dictionary word "jen" will

2527

# match the first three letters of the text "jen123" but will return no

2528

# matches for "jennifer".

2529

#

2530

# Dictionary words containing a large number of characters that are not

2531

# letters or digits may result in unexpected findings because such characters

2532

# are treated as whitespace. The

2533

# [limits](https://cloud.google.com/dlp/limits) page contains details about

2534

# the size limits of dictionaries. For dictionaries that do not fit within

2535

# these constraints, consider using `LargeCustomDictionaryConfig` in the

2536

# `StoredInfoType` API.

2537

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

2538

# is accepted.

2539

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

2540

# Example: gs://[BUCKET_NAME]/dictionary.txt

2541

},

2542

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

2543

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

2544

# at least one phrase and every phrase must contain at least 2 characters

2545

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

2551

# infoType, when the name matches one of existing infoTypes and that infoType

2552

# is specified in `InspectContent.info_types` field. Specifying the latter

2553

# adds findings to the one detected by the system. If built-in info type is

2554

# not specified in `InspectContent.info_types` list then the name is treated

2555

# as a custom info type.

2556

"name": "A String", # Name of the information type. Either a name of your choosing when

2557

# creating a CustomInfoType, or one of the names listed

2558

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2559

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2560

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2561

},

2562

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

2563

# altered by a detection rule if the finding meets the criteria specified by

2564

# the rule. Defaults to `VERY_LIKELY` if not specified.

2565

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

2566

# Rules are applied in order that they are specified. Not supported for the

2567

# `surrogate_type` CustomInfoType.

2568

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

2569

# `CustomInfoType` to alter behavior under certain circumstances, depending

2570

# on the specific details of the rule. Not supported for the `surrogate_type`

2571

# custom infoType.

2572

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

2573

# proximity of hotwords.

2574

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

2575

# The total length of the window cannot exceed 1000 characters. Note that

2576

# the finding itself will be included in the window, so that hotwords may

2577

# be used to match substrings of the finding itself. For example, the

2578

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

2579

# adjusted upwards if the area code is known to be the local area code of

2580

# a company office using the hotword regex "$xxx$", where "xxx"

2581

# is the area code in question.

2582

# rule.

2583

"windowAfter": 42, # Number of characters after the finding to consider.

2584

"windowBefore": 42, # Number of characters before the finding to consider.

2585

},

2586

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

2587

# part of a detection rule.

2588

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

2589

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

2590

# levels. For example, if a finding would be `POSSIBLE` without the

2591

# detection rule and `relative_likelihood` is 1, then it is upgraded to

2592

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

2593

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

2594

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

2595

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

2596

# a final likelihood of `LIKELY`.

2597

},

2598

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

2599

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2600

# specified, the entire match is returned. No more than 3 may be included.

2601

42,

2602

],

2603

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2604

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2605

# google/re2 repository on GitHub.

},

},

},

],

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

2611

# support reversing.

2612

# such as

2613

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

2614

# These types of transformations are

2615

# those that perform pseudonymization, thereby producing a "surrogate" as

2616

# output. This should be used in conjunction with a field on the

2617

# transformation such as `surrogate_info_type`. This CustomInfoType does

2618

# not support the use of `detection_rules`.

2619

},

2620

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

2621

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2622

# specified, the entire match is returned. No more than 3 may be included.

2623

42,

2624

],

2625

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2626

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2627

# google/re2 repository on GitHub.

2628

},

2629

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

2630

# `InspectDataSource`. Not currently supported in `InspectContent`.

2631

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

2632

# `organizations/433245324/storedInfoTypes/432452342` or

2633

# `projects/project-id/storedInfoTypes/432452342`.

2634

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

2635

# inspection was created. Output-only field, populated by the system.

2636

},

2637

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

2638

# to be returned. It still can be used for rules matching.

2639

},

2640

],

2641

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

2642

# POSSIBLE.

2643

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

2644

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

2645

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

2646

# When set within `InspectContentRequest`, the maximum returned is 2000

2647

# regardless if this is set higher.

2648

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

2649

{ # Max findings configuration per infoType, per content item or long

2650

# running DlpJob.

2651

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

2652

# info_type should be provided. If InfoTypeLimit does not have an

2653

# info_type, the DLP API applies the limit against all info_types that

2654

# are found but not specified in another InfoTypeLimit.

2655

"name": "A String", # Name of the information type. Either a name of your choosing when

2656

# creating a CustomInfoType, or one of the names listed

2657

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2658

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2659

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2660

},

2661

"maxFindings": 42, # Max findings limit for the given infoType.

2662

},

2663

],

2664

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

2665

# When set within `InspectJobConfig`,

2666

# the maximum returned is 2000 regardless if this is set higher.

2667

# When set within `InspectContentRequest`, this field is ignored.

2668

},

2669

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

2670

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

2671

# included in the response; see Finding.quote.

2672

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

2673

# Exclusion rules, contained in the set are executed in the end, other

2674

# rules are executed in the order they are specified for each info type.

2675

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

2676

# circumstances, depending on the specific details of the rules within the set.

2677

"infoTypes": [ # List of infoTypes this rule set is applied to.

2678

{ # Type of information detected by the API.

2679

"name": "A String", # Name of the information type. Either a name of your choosing when

2680

# creating a CustomInfoType, or one of the names listed

2681

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2682

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2683

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2684

},

2685

],

2686

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

2687

{ # A single inspection rule to be applied to infoTypes, specified in

2688

# `InspectionRuleSet`.

2689

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

2690

# proximity of hotwords.

2691

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

2692

# The total length of the window cannot exceed 1000 characters. Note that

2693

# the finding itself will be included in the window, so that hotwords may

2694

# be used to match substrings of the finding itself. For example, the

2695

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

2696

# adjusted upwards if the area code is known to be the local area code of

2697

# a company office using the hotword regex "$xxx$", where "xxx"

2698

# is the area code in question.

2699

# rule.

2700

"windowAfter": 42, # Number of characters after the finding to consider.

2701

"windowBefore": 42, # Number of characters before the finding to consider.

2702

},

2703

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

2704

# part of a detection rule.

2705

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

2706

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

2707

# levels. For example, if a finding would be `POSSIBLE` without the

2708

# detection rule and `relative_likelihood` is 1, then it is upgraded to

2709

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

2710

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

2711

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

2712

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

2713

# a final likelihood of `LIKELY`.

2714

},

2715

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

2716

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2717

# specified, the entire match is returned. No more than 3 may be included.

2718

42,

2719

],

2720

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2721

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2722

# google/re2 repository on GitHub.

2723

},

2724

},

2725

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

2726

# `InspectionRuleSet` are removed from results.

2727

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

2728

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

2729

# be used to match sensitive information specific to the data, such as a list

2730

# of employee IDs or job titles.

2731

#

2732

# Dictionary words are case-insensitive and all characters other than letters

2733

# and digits in the unicode [Basic Multilingual

2734

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

2735

# will be replaced with whitespace when scanning for matches, so the

2736

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

2737

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

2738

# surrounding any match must be of a different type than the adjacent

2739

# characters within the word, so letters must be next to non-letters and

2740

# digits next to non-digits. For example, the dictionary word "jen" will

2741

# match the first three letters of the text "jen123" but will return no

2742

# matches for "jennifer".

2743

#

2744

# Dictionary words containing a large number of characters that are not

2745

# letters or digits may result in unexpected findings because such characters

2746

# are treated as whitespace. The

2747

# [limits](https://cloud.google.com/dlp/limits) page contains details about

2748

# the size limits of dictionaries. For dictionaries that do not fit within

2749

# these constraints, consider using `LargeCustomDictionaryConfig` in the

2750

# `StoredInfoType` API.

2751

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

2752

# is accepted.

2753

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

2754

# Example: gs://[BUCKET_NAME]/dictionary.txt

2755

},

2756

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

2757

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

2758

# at least one phrase and every phrase must contain at least 2 characters

2759

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

2765

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

2766

# contained within with a finding of an infoType from this list. For

2767

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

2768

# `exclusion_rule` containing `exclude_info_types.info_types` with

2769

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

2770

# with EMAIL_ADDRESS finding.

2771

# That leads to "555-222-2222@example.org" to generate only a single

2772

# finding, namely email address.

2773

{ # Type of information detected by the API.

2774

"name": "A String", # Name of the information type. Either a name of your choosing when

2775

# creating a CustomInfoType, or one of the names listed

2776

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2777

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2778

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

2783

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2784

# specified, the entire match is returned. No more than 3 may be included.

2785

42,

2786

],

2787

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2788

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2789

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

2797

# If empty, text, images, and other content will be included.

2798

"A String",

2799

],

2800

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

2801

# InfoType values returned by ListInfoTypes or listed at

2802

# https://cloud.google.com/dlp/docs/infotypes-reference.

2803

#

2804

# When no InfoTypes or CustomInfoTypes are specified in a request, the

2805

# system may automatically choose what detectors to run. By default this may

2806

# be all types, but may change over time as detectors are updated.

2807

#

2808

# If you need precise control and predictability as to what detectors are

2809

# run you should specify specific InfoTypes listed in the reference,

2810

# otherwise a default list will be used, which may change over time.

2811

{ # Type of information detected by the API.

2812

"name": "A String", # Name of the information type. Either a name of your choosing when

2813

# creating a CustomInfoType, or one of the names listed

2814

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2815

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2816

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

},

},

"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.

2823

"hybridStats": { # Statistics related to processing hybrid inspect requests. # Statistics related to the processing of hybrid inspect.

2824

# Early access feature is in a pre-release state and might change or have

2825

# limited support. For more information, see

2826

# https://cloud.google.com/products#product-launch-stages.

2827

"processedCount": "A String", # The number of hybrid inspection requests processed within this job.

2828

"abortedCount": "A String", # The number of hybrid inspection requests aborted because the job ran

2829

# out of quota or was ended before they could be processed.

2830

"pendingCount": "A String", # The number of hybrid requests currently being processed. Only populated

2831

# when called via method `getDlpJob`.

2832

# A burst of traffic may cause hybrid inspect requests to be enqueued.

2833

# Processing will take place as quickly as possible, but resource limitations

2834

# may impact how long a request is enqueued for.

2835

},

2836

"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.

2837

"infoTypeStats": [ # Statistics of how many instances of each info type were found during

2838

# inspect job.

2839

{ # Statistics regarding a specific InfoType.

2840

"infoType": { # Type of information detected by the API. # The type of finding this stat is for.

2841

"name": "A String", # Name of the information type. Either a name of your choosing when

2842

# creating a CustomInfoType, or one of the names listed

2843

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2844

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

2845

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

2846

},

2847

"count": "A String", # Number of findings for this infoType.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2848

},

2849

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2850

"processedBytes": "A String", # Total size in bytes that were processed.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2851

},

2852

},

2853

"name": "A String", # The server-assigned name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

<code class="details" id="delete">delete(name, x__xgafv=None)</code>

2859

<pre>Deletes a long-running DlpJob. This method indicates that the client is

2860

no longer interested in the DlpJob result. The job will be cancelled if

2861

possible.

2862

See https://cloud.google.com/dlp/docs/inspecting-storage and

2863

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

2864

2865

Args:

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2866

name: string, Required. The name of the DlpJob resource to be deleted. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2867

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2874

2875

{ # A generic empty message that you can re-use to avoid defining duplicated

2876

# empty messages in your APIs. A typical example is to use it as the request

2877

# or the response type of an API method. For instance:

2878

#

2879

# service Foo {

2880

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

2881

# }

2882

#

2883

# The JSON representation for `Empty` is empty JSON object `{}`.

}</pre>

</div>

<code class="details" id="get">get(name, x__xgafv=None)</code>

2889

<pre>Gets the latest state of a long-running DlpJob.

2890

See https://cloud.google.com/dlp/docs/inspecting-storage and

2891

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

2892

2893

Args:

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

2894

name: string, Required. The name of the DlpJob resource. (required)

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

2895

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2902

2903

{ # Combines all of the information about a DLP job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2904

"errors": [ # A stream of errors encountered running the job.

2905

{ # Details information about an error encountered during job execution or

2906

# the results of an unsuccessful activation of the JobTrigger.

2907

"timestamps": [ # The times the error occurred.

2908

"A String",

2909

],

2910

"details": { # The `Status` type defines a logical error model that is suitable for # Detailed error codes and messages.

2911

# different programming environments, including REST APIs and RPC APIs. It is

2912

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

2913

# three pieces of data: error code, error message, and error details.

2914

#

2915

# You can find out more about this error model and how to work with it in the

2916

# [API Design Guide](https://cloud.google.com/apis/design/errors).

2917

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

2918

"details": [ # A list of messages that carry the error details. There is a common set of

2919

# message types for APIs to use.

2920

{

2921

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2922

},

2923

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

2924

"message": "A String", # A developer-facing error message, which should be in English. Any

2925

# user-facing error message should be localized and sent in the

2926

# google.rpc.Status.details field, or localized by the client.

},

},

],

"createTime": "A String", # Time when the job was created.

2931

"state": "A String", # State of a job.

2932

"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.

2933

"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an # K-map result

2934

# estimation, not exact values.

2935

"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value

2936

# doesn't correspond to any such interval, the associated frequency is

2937

# zero. For example, the following records:

2938

# {min_anonymity: 1, max_anonymity: 1, frequency: 17}

2939

# {min_anonymity: 2, max_anonymity: 3, frequency: 42}

2940

# {min_anonymity: 5, max_anonymity: 10, frequency: 99}

2941

# mean that there are no record with an estimated anonymity of 4, 5, or

2942

# larger than 10.

2943

{ # A KMapEstimationHistogramBucket message with the following values:

# min_anonymity: 3

# max_anonymity: 5

# frequency: 42

# means that there are 42 records whose quasi-identifier values correspond

2948

# to 3, 4 or 5 people in the overlying population. An important particular

2949

# case is when min_anonymity = max_anonymity = 1: the frequency field then

2950

# corresponds to the number of uniquely identifiable records.

2951

"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.

2952

"bucketSize": "A String", # Number of records within these anonymity bounds.

2953

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

2954

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

2955

# number of classes returned per bucket is capped at 20.

2956

{ # A tuple of values for the quasi-identifier columns.

2957

"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.

2958

"quasiIdsValues": [ # The quasi-identifier values.

2959

{ # Set of primitive values supported by the system.

2960

# Note that for the purposes of inspection or transformation, the number

2961

# of bytes considered to comprise a 'Value' is based on its representation

2962

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2963

# 123456789, the number of bytes would be counted as 9, even though an

2964

# int64 only holds up to 8 bytes of data.

2965

"integerValue": "A String", # integer

2966

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

2967

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2968

# types are google.type.Date and `google.protobuf.Timestamp`.

2969

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2970

# allow the value 60 if it allows leap-seconds.

2971

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2972

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2973

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2974

# to allow the value "24:00:00" for scenarios like business closing time.

2975

},

2976

"dayOfWeekValue": "A String", # day of week

2977

"floatValue": 3.14, # float

2978

"stringValue": "A String", # string

2979

"timestampValue": "A String", # timestamp

2980

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

2981

# and time zone are either specified elsewhere or are not significant. The date

2982

# is relative to the Proleptic Gregorian Calendar. This can represent:

2983

#

2984

# * A full date, with non-zero year, month and day values

2985

# * A month and day value, with a zero year, e.g. an anniversary

2986

# * A year on its own, with zero month and day values

2987

# * A year and month value, with a zero day, e.g. a credit card expiration date

2988

#

2989

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2990

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2991

# month and day.

2992

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2993

# a year.

2994

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2995

# if specifying a year by itself or a year and month where the day is not

2996

# significant.

2997

},

2998

"booleanValue": True or False, # boolean

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

2999

},

3000

],

3001

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3002

],

3003

"minAnonymity": "A String", # Always positive.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3004

},

3005

],

3006

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3007

"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an # Delta-presence result

3008

# estimation, not exact values.

3009

"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a

3010

# value doesn't correspond to any such interval, the associated frequency

3011

# is zero. For example, the following records:

3012

# {min_probability: 0, max_probability: 0.1, frequency: 17}

3013

# {min_probability: 0.2, max_probability: 0.3, frequency: 42}

3014

# {min_probability: 0.3, max_probability: 0.4, frequency: 99}

3015

# mean that there are no record with an estimated probability in [0.1, 0.2)

3016

# nor larger or equal to 0.4.

3017

{ # A DeltaPresenceEstimationHistogramBucket message with the following

3018

# values:

3019

# min_probability: 0.1

3020

# max_probability: 0.2

3021

# frequency: 42

3022

# means that there are 42 records for which δ is in [0.1, 0.2). An

3023

# important particular case is when min_probability = max_probability = 1:

3024

# then, every individual who shares this quasi-identifier combination is in

3025

# the dataset.

3026

"maxProbability": 3.14, # Always greater than or equal to min_probability.

3027

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

3028

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

3029

# number of classes returned per bucket is capped at 20.

3030

{ # A tuple of values for the quasi-identifier columns.

3031

"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these

3032

# quasi-identifier values is in the dataset. This value, typically called

3033

# δ, is the ratio between the number of records in the dataset with these

3034

# quasi-identifier values, and the total number of individuals (inside

3035

# *and* outside the dataset) with these quasi-identifier values.

3036

# For example, if there are 15 individuals in the dataset who share the

3037

# same quasi-identifier values, and an estimated 100 people in the entire

3038

# population with these values, then δ is 0.15.

3039

"quasiIdsValues": [ # The quasi-identifier values.

3040

{ # Set of primitive values supported by the system.

3041

# Note that for the purposes of inspection or transformation, the number

3042

# of bytes considered to comprise a 'Value' is based on its representation

3043

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3044

# 123456789, the number of bytes would be counted as 9, even though an

3045

# int64 only holds up to 8 bytes of data.

3046

"integerValue": "A String", # integer

3047

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3048

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3049

# types are google.type.Date and `google.protobuf.Timestamp`.

3050

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3051

# allow the value 60 if it allows leap-seconds.

3052

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3053

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3054

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3055

# to allow the value "24:00:00" for scenarios like business closing time.

3056

},

3057

"dayOfWeekValue": "A String", # day of week

3058

"floatValue": 3.14, # float

3059

"stringValue": "A String", # string

3060

"timestampValue": "A String", # timestamp

3061

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3062

# and time zone are either specified elsewhere or are not significant. The date

3063

# is relative to the Proleptic Gregorian Calendar. This can represent:

3064

#

3065

# * A full date, with non-zero year, month and day values

3066

# * A month and day value, with a zero year, e.g. an anniversary

3067

# * A year on its own, with zero month and day values

3068

# * A year and month value, with a zero day, e.g. a credit card expiration date

3069

#

3070

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3071

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3072

# month and day.

3073

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3074

# a year.

3075

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3076

# if specifying a year by itself or a year and month where the day is not

3077

# significant.

3078

},

3079

"booleanValue": True or False, # boolean

},

],

},

],

"minProbability": 3.14, # Between 0 and 1.

3085

"bucketSize": "A String", # Number of records within these probability bounds.

},

],

},

"categoricalStatsResult": { # Result of the categorical stats computation. # Categorical stats result

3090

"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.

3091

{ # Histogram of value frequencies in the column.

3092

"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.

3093

"bucketValueCount": "A String", # Total number of distinct values in this bucket.

3094

"bucketSize": "A String", # Total number of values in this bucket.

3095

"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.

3096

"bucketValues": [ # Sample of value frequencies in this bucket. The total number of

3097

# values returned per bucket is capped at 20.

3098

{ # A value of a field, including its frequency.

3099

"count": "A String", # How many times the value is contained in the field.

3100

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

3101

# Note that for the purposes of inspection or transformation, the number

3102

# of bytes considered to comprise a 'Value' is based on its representation

3103

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3104

# 123456789, the number of bytes would be counted as 9, even though an

3105

# int64 only holds up to 8 bytes of data.

3106

"integerValue": "A String", # integer

3107

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3108

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3109

# types are google.type.Date and `google.protobuf.Timestamp`.

3110

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3111

# allow the value 60 if it allows leap-seconds.

3112

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3113

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3114

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3115

# to allow the value "24:00:00" for scenarios like business closing time.

3116

},

3117

"dayOfWeekValue": "A String", # day of week

3118

"floatValue": 3.14, # float

3119

"stringValue": "A String", # string

3120

"timestampValue": "A String", # timestamp

3121

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3122

# and time zone are either specified elsewhere or are not significant. The date

3123

# is relative to the Proleptic Gregorian Calendar. This can represent:

3124

#

3125

# * A full date, with non-zero year, month and day values

3126

# * A month and day value, with a zero year, e.g. an anniversary

3127

# * A year on its own, with zero month and day values

3128

# * A year and month value, with a zero day, e.g. a credit card expiration date

3129

#

3130

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3131

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3132

# month and day.

3133

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3134

# a year.

3135

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3136

# if specifying a year by itself or a year and month where the day is not

3137

# significant.

3138

},

3139

"booleanValue": True or False, # boolean

},

},

],

},

],

},

"numericalStatsResult": { # Result of the numerical stats computation. # Numerical stats result

3147

"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal

3148

# sized buckets.

3149

{ # Set of primitive values supported by the system.

3150

# Note that for the purposes of inspection or transformation, the number

3151

# of bytes considered to comprise a 'Value' is based on its representation

3152

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3153

# 123456789, the number of bytes would be counted as 9, even though an

3154

# int64 only holds up to 8 bytes of data.

3155

"integerValue": "A String", # integer

3156

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3157

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3158

# types are google.type.Date and `google.protobuf.Timestamp`.

3159

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3160

# allow the value 60 if it allows leap-seconds.

3161

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3162

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3163

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3164

# to allow the value "24:00:00" for scenarios like business closing time.

3165

},

3166

"dayOfWeekValue": "A String", # day of week

3167

"floatValue": 3.14, # float

3168

"stringValue": "A String", # string

3169

"timestampValue": "A String", # timestamp

3170

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3171

# and time zone are either specified elsewhere or are not significant. The date

3172

# is relative to the Proleptic Gregorian Calendar. This can represent:

3173

#

3174

# * A full date, with non-zero year, month and day values

3175

# * A month and day value, with a zero year, e.g. an anniversary

3176

# * A year on its own, with zero month and day values

3177

# * A year and month value, with a zero day, e.g. a credit card expiration date

3178

#

3179

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3180

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3181

# month and day.

3182

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3183

# a year.

3184

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3185

# if specifying a year by itself or a year and month where the day is not

3186

# significant.

3187

},

3188

"booleanValue": True or False, # boolean

3189

},

3190

],

3191

"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.

3192

# Note that for the purposes of inspection or transformation, the number

3193

# of bytes considered to comprise a 'Value' is based on its representation

3194

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3195

# 123456789, the number of bytes would be counted as 9, even though an

3196

# int64 only holds up to 8 bytes of data.

3197

"integerValue": "A String", # integer

3198

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3199

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3200

# types are google.type.Date and `google.protobuf.Timestamp`.

3201

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3202

# allow the value 60 if it allows leap-seconds.

3203

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3204

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3205

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3206

# to allow the value "24:00:00" for scenarios like business closing time.

3207

},

3208

"dayOfWeekValue": "A String", # day of week

3209

"floatValue": 3.14, # float

3210

"stringValue": "A String", # string

3211

"timestampValue": "A String", # timestamp

3212

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3213

# and time zone are either specified elsewhere or are not significant. The date

3214

# is relative to the Proleptic Gregorian Calendar. This can represent:

3215

#

3216

# * A full date, with non-zero year, month and day values

3217

# * A month and day value, with a zero year, e.g. an anniversary

3218

# * A year on its own, with zero month and day values

3219

# * A year and month value, with a zero day, e.g. a credit card expiration date

3220

#

3221

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3222

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3223

# month and day.

3224

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3225

# a year.

3226

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3227

# if specifying a year by itself or a year and month where the day is not

3228

# significant.

3229

},

3230

"booleanValue": True or False, # boolean

3231

},

3232

"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.

3233

# Note that for the purposes of inspection or transformation, the number

3234

# of bytes considered to comprise a 'Value' is based on its representation

3235

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3236

# 123456789, the number of bytes would be counted as 9, even though an

3237

# int64 only holds up to 8 bytes of data.

3238

"integerValue": "A String", # integer

3239

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3240

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3241

# types are google.type.Date and `google.protobuf.Timestamp`.

3242

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3243

# allow the value 60 if it allows leap-seconds.

3244

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3245

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3246

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3247

# to allow the value "24:00:00" for scenarios like business closing time.

3248

},

3249

"dayOfWeekValue": "A String", # day of week

3250

"floatValue": 3.14, # float

3251

"stringValue": "A String", # string

3252

"timestampValue": "A String", # timestamp

3253

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3254

# and time zone are either specified elsewhere or are not significant. The date

3255

# is relative to the Proleptic Gregorian Calendar. This can represent:

3256

#

3257

# * A full date, with non-zero year, month and day values

3258

# * A month and day value, with a zero year, e.g. an anniversary

3259

# * A year on its own, with zero month and day values

3260

# * A year and month value, with a zero day, e.g. a credit card expiration date

3261

#

3262

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3263

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3264

# month and day.

3265

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3266

# a year.

3267

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3268

# if specifying a year by itself or a year and month where the day is not

3269

# significant.

3270

},

3271

"booleanValue": True or False, # boolean

3272

},

3273

},

3274

"kAnonymityResult": { # Result of the k-anonymity computation. # K-anonymity result

3275

"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.

3276

{ # Histogram of k-anonymity equivalence classes.

3277

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

3278

# classes returned per bucket is capped at 20.

3279

{ # The set of columns' values that share the same ldiversity value

3280

"quasiIdsValues": [ # Set of values defining the equivalence class. One value per

3281

# quasi-identifier column in the original KAnonymity metric message.

3282

# The order is always the same as the original request.

3283

{ # Set of primitive values supported by the system.

3284

# Note that for the purposes of inspection or transformation, the number

3285

# of bytes considered to comprise a 'Value' is based on its representation

3286

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3287

# 123456789, the number of bytes would be counted as 9, even though an

3288

# int64 only holds up to 8 bytes of data.

3289

"integerValue": "A String", # integer

3290

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3291

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3292

# types are google.type.Date and `google.protobuf.Timestamp`.

3293

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3294

# allow the value 60 if it allows leap-seconds.

3295

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3296

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3297

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3298

# to allow the value "24:00:00" for scenarios like business closing time.

3299

},

3300

"dayOfWeekValue": "A String", # day of week

3301

"floatValue": 3.14, # float

3302

"stringValue": "A String", # string

3303

"timestampValue": "A String", # timestamp

3304

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3305

# and time zone are either specified elsewhere or are not significant. The date

3306

# is relative to the Proleptic Gregorian Calendar. This can represent:

3307

#

3308

# * A full date, with non-zero year, month and day values

3309

# * A month and day value, with a zero year, e.g. an anniversary

3310

# * A year on its own, with zero month and day values

3311

# * A year and month value, with a zero day, e.g. a credit card expiration date

3312

#

3313

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3314

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3315

# month and day.

3316

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3317

# a year.

3318

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3319

# if specifying a year by itself or a year and month where the day is not

3320

# significant.

3321

},

3322

"booleanValue": True or False, # boolean

3323

},

3324

],

3325

"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the

3326

# above set of values.

3327

},

3328

],

3329

"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.

3330

"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.

3331

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

3332

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

3333

},

3334

],

3335

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3336

"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3337

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including # Categorical stats

3338

# number of distinct values and value count distribution.

3339

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

3340

# supported except for arrays and structs. However, it may be more

3341

# informative to use NumericalStats when the field type is supported,

3342

# depending on the data.

3343

"name": "A String", # Name describing the field.

3344

},

3345

},

3346

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk. # l-diversity

3347

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

3348

"name": "A String", # Name describing the field.

3349

},

3350

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

3351

# defined for the l-diversity computation. When multiple fields are

3352

# specified, they are considered a single composite key.

3353

{ # General identifier of a data field in a storage service.

3354

"name": "A String", # Name describing the field.

},

],

},

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what # k-map

3359

# is called "journalist risk" in the literature, except the attack dataset is

3360

# statistically modeled instead of being perfectly known. This can be done

3361

# using publicly available data (like the US Census), or using a custom

3362

# statistical model (indicated as one or several BigQuery tables), or by

3363

# extrapolating from the distribution of values in the input dataset.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3364

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

3365

# Set if no column is tagged with a region-specific InfoType (like

3366

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3367

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two columns can have the

3368

# same tag.

3369

{ # A column with a semantic tag attached.

3370

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

3371

"name": "A String", # Name describing the field.

3372

},

3373

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

3374

# indicate an auxiliary table that contains statistical information on

3375

# the possible values of this column (below).

3376

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

3377

# dataset as a statistical model of population, if available. We

3378

# currently support US ZIP codes, region codes, ages and genders.

3379

# To programmatically obtain the list of supported InfoTypes, use

3380

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

3381

"name": "A String", # Name of the information type. Either a name of your choosing when

3382

# creating a CustomInfoType, or one of the names listed

3383

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3384

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

3385

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

3386

},

3387

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

3388

# the distribution of values in the input data

3389

# empty messages in your APIs. A typical example is to use it as the request

3390

# or the response type of an API method. For instance:

3391

#

3392

# service Foo {

3393

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

3394

# }

3395

#

3396

# The JSON representation for `Empty` is empty JSON object `{}`.

3397

},

3398

},

3399

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3400

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3401

# used to tag a quasi-identifiers column must appear in exactly one column

3402

# of one auxiliary table.

3403

{ # An auxiliary table contains statistical information on the relative

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3404

# frequency of different quasi-identifiers values. It has one or several

3405

# quasi-identifiers columns, and one column that indicates the relative

3406

# frequency of each quasi-identifier tuple.

3407

# If a tuple is present in the data but not in the auxiliary table, the

3408

# corresponding relative frequency is assumed to be zero (and thus, the

3409

# tuple is highly reidentifiable).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3410

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

3411

# identified by its project_id, dataset_id, and table_name. Within a query

3412

# a table is often referenced with a string in the format of:

3413

# `<project_id>:<dataset_id>.<table_id>` or

3414

# `<project_id>.<dataset_id>.<table_id>`.

3415

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3416

# If omitted, project ID is inferred from the API call.

3417

"datasetId": "A String", # Dataset ID of the table.

3418

"tableId": "A String", # Name of the table.

3419

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3420

"quasiIds": [ # Required. Quasi-identifier columns.

3421

{ # A quasi-identifier column has a custom_tag, used to know which column

3422

# in the data corresponds to which column in the statistical model.

3423

"customTag": "A String", # A auxiliary field.

3424

"field": { # General identifier of a data field in a storage service. # Identifies the column.

3425

"name": "A String", # Name describing the field.

3426

},

3427

},

3428

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3429

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

3430

# between 0 and 1 (inclusive). Null values are assumed to be zero.

3431

"name": "A String", # Name describing the field.

3432

},

3433

},

3434

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3435

},

3436

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to # delta-presence

3437

# figure out that one given individual appears in a de-identified dataset.

3438

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

3439

# knowing the attack dataset, so we use a statistical model instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3440

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two fields can have the

3441

# same tag.

3442

{ # A column with a semantic tag attached.

3443

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

3444

"name": "A String", # Name describing the field.

3445

},

3446

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

3447

# dataset as a statistical model of population, if available. We

3448

# currently support US ZIP codes, region codes, ages and genders.

3449

# To programmatically obtain the list of supported InfoTypes, use

3450

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

3451

"name": "A String", # Name of the information type. Either a name of your choosing when

3452

# creating a CustomInfoType, or one of the names listed

3453

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3454

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

3455

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3456

},

3457

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

3458

# indicate an auxiliary table that contains statistical information on

3459

# the possible values of this column (below).

3460

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

3461

# the distribution of values in the input data

3462

# empty messages in your APIs. A typical example is to use it as the request

3463

# or the response type of an API method. For instance:

3464

#

3465

# service Foo {

3466

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

3467

# }

3468

#

3469

# The JSON representation for `Empty` is empty JSON object `{}`.

3470

},

3471

},

3472

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3473

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

3474

# used to tag a quasi-identifiers field must appear in exactly one

3475

# field of one auxiliary table.

3476

{ # An auxiliary table containing statistical information on the relative

3477

# frequency of different quasi-identifiers values. It has one or several

3478

# quasi-identifiers columns, and one column that indicates the relative

3479

# frequency of each quasi-identifier tuple.

3480

# If a tuple is present in the data but not in the auxiliary table, the

3481

# corresponding relative frequency is assumed to be zero (and thus, the

3482

# tuple is highly reidentifiable).

3483

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

3484

# between 0 and 1 (inclusive). Null values are assumed to be zero.

3485

"name": "A String", # Name describing the field.

3486

},

3487

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

3488

# identified by its project_id, dataset_id, and table_name. Within a query

3489

# a table is often referenced with a string in the format of:

3490

# `<project_id>:<dataset_id>.<table_id>` or

3491

# `<project_id>.<dataset_id>.<table_id>`.

3492

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3493

# If omitted, project ID is inferred from the API call.

3494

"datasetId": "A String", # Dataset ID of the table.

3495

"tableId": "A String", # Name of the table.

3496

},

3497

"quasiIds": [ # Required. Quasi-identifier columns.

3498

{ # A quasi-identifier column has a custom_tag, used to know which column

3499

# in the data corresponds to which column in the statistical model.

3500

"field": { # General identifier of a data field in a storage service. # Identifies the column.

3501

"name": "A String", # Name describing the field.

3502

},

3503

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

3504

# indicate an auxiliary table that contains statistical information on

3505

# the possible values of this column (below).

},

],

},

],

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

3511

# Set if no column is tagged with a region-specific InfoType (like

3512

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3513

},

3514

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk. # K-anonymity

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3515

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Message indicating that multiple rows might be associated to a

3516

# single individual. If the same entity_id is associated to multiple

3517

# quasi-identifier tuples over distinct rows, we consider the entire

3518

# collection of tuples as the composite quasi-identifier. This collection

3519

# is a multiset: the order in which the different tuples appear in the

3520

# dataset is ignored, but their frequency is taken into account.

3521

#

3522

# Important note: a maximum of 1000 rows can be associated to a single

3523

# entity ID. If more rows are associated with the same entity ID, some

3524

# might be ignored.

3525

# single person. For example, in medical records the `EntityId` might be a

3526

# patient identifier, or for financial records it might be an account

3527

# identifier. This message is used when generalizations or analysis must take

3528

# into account that multiple rows correspond to the same entity.

3529

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

3530

"name": "A String", # Name describing the field.

3531

},

3532

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3533

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

3534

# specified, they are considered a single composite key. Structs and

3535

# repeated data types are not supported; however, nested fields are

3536

# supported so long as they are not structs themselves or nested within

3537

# a repeated field.

3538

{ # General identifier of a data field in a storage service.

3539

"name": "A String", # Name describing the field.

3540

},

3541

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3542

},

3543

"numericalStatsConfig": { # Compute numerical stats over an individual column, including # Numerical stats

3544

# min, max, and quantiles.

3545

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

3546

# integer, float, date, datetime, timestamp, time.

3547

"name": "A String", # Name describing the field.

3548

},

3549

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3550

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3551

"lDiversityResult": { # Result of the l-diversity computation. # L-divesity result

3552

"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.

3553

{ # Histogram of l-diversity equivalence class sensitive value frequencies.

3554

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3555

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

3556

# classes returned per bucket is capped at 20.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3557

{ # The set of columns' values that share the same ldiversity value.

3558

"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence

3559

# class. The order is always the same as the original request.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3560

{ # Set of primitive values supported by the system.

3561

# Note that for the purposes of inspection or transformation, the number

3562

# of bytes considered to comprise a 'Value' is based on its representation

3563

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3564

# 123456789, the number of bytes would be counted as 9, even though an

3565

# int64 only holds up to 8 bytes of data.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3566

"integerValue": "A String", # integer

3567

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3568

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3569

# types are google.type.Date and `google.protobuf.Timestamp`.

3570

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3571

# allow the value 60 if it allows leap-seconds.

3572

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3573

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3574

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3575

# to allow the value "24:00:00" for scenarios like business closing time.

3576

},

3577

"dayOfWeekValue": "A String", # day of week

3578

"floatValue": 3.14, # float

3579

"stringValue": "A String", # string

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3580

"timestampValue": "A String", # timestamp

3581

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3582

# and time zone are either specified elsewhere or are not significant. The date

3583

# is relative to the Proleptic Gregorian Calendar. This can represent:

3584

#

3585

# * A full date, with non-zero year, month and day values

3586

# * A month and day value, with a zero year, e.g. an anniversary

3587

# * A year on its own, with zero month and day values

3588

# * A year and month value, with a zero day, e.g. a credit card expiration date

3589

#

3590

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3591

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3592

# month and day.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3593

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3594

# a year.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3595

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3596

# if specifying a year by itself or a year and month where the day is not

3597

# significant.

3598

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3599

"booleanValue": True or False, # boolean

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3600

},

3601

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3602

"topSensitiveValues": [ # Estimated frequencies of top sensitive values.

3603

{ # A value of a field, including its frequency.

3604

"count": "A String", # How many times the value is contained in the field.

3605

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

3606

# Note that for the purposes of inspection or transformation, the number

3607

# of bytes considered to comprise a 'Value' is based on its representation

3608

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3609

# 123456789, the number of bytes would be counted as 9, even though an

3610

# int64 only holds up to 8 bytes of data.

3611

"integerValue": "A String", # integer

3612

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

3613

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3614

# types are google.type.Date and `google.protobuf.Timestamp`.

3615

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3616

# allow the value 60 if it allows leap-seconds.

3617

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3618

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3619

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3620

# to allow the value "24:00:00" for scenarios like business closing time.

3621

},

3622

"dayOfWeekValue": "A String", # day of week

3623

"floatValue": 3.14, # float

3624

"stringValue": "A String", # string

3625

"timestampValue": "A String", # timestamp

3626

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

3627

# and time zone are either specified elsewhere or are not significant. The date

3628

# is relative to the Proleptic Gregorian Calendar. This can represent:

3629

#

3630

# * A full date, with non-zero year, month and day values

3631

# * A month and day value, with a zero year, e.g. an anniversary

3632

# * A year on its own, with zero month and day values

3633

# * A year and month value, with a zero day, e.g. a credit card expiration date

3634

#

3635

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3636

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3637

# month and day.

3638

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3639

# a year.

3640

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3641

# if specifying a year by itself or a year and month where the day is not

3642

# significant.

3643

},

3644

"booleanValue": True or False, # boolean

},

},

],

"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.

3649

"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

3650

},

3651

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3652

"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence

3653

# classes in this bucket.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3654

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3655

"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence

3656

# classes in this bucket.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

3661

# identified by its project_id, dataset_id, and table_name. Within a query

3662

# a table is often referenced with a string in the format of:

3663

# `<project_id>:<dataset_id>.<table_id>` or

3664

# `<project_id>.<dataset_id>.<table_id>`.

3665

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3666

# If omitted, project ID is inferred from the API call.

3667

"datasetId": "A String", # Dataset ID of the table.

3668

"tableId": "A String", # Name of the table.

3669

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3670

},

3671

"type": "A String", # The type of job.

3672

"endTime": "A String", # Time when the job finished.

3673

"startTime": "A String", # Time when the job started.

3674

"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that

3675

# instantiated the job.

3676

"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.

3677

"requestedOptions": { # Snapshot of the inspection configuration. # The configuration used for this job.

3678

"jobConfig": { # Controls what and how to inspect for findings. # Inspect config.

3679

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

3680

# `inspect_config` will be merged into the values persisted as part of the

3681

# template.

3682

"actions": [ # Actions to execute at the completion of the job.

3683

{ # A task to execute on the completion of a job.

3684

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

3685

"publishToStackdriver": { # Enable Stackdriver metric dlp.googleapis.com/finding_count. This # Enable Stackdriver metric dlp.googleapis.com/finding_count.

3686

# will publish a metric to stack driver on each infotype requested and

3687

# how many findings were found for it. CustomDetectors will be bucketed

3688

# as 'Custom' under the Stackdriver label 'info_type'.

3689

},

3690

"publishFindingsToCloudDataCatalog": { # Publish findings of a DlpJob to Cloud Data Catalog. Labels summarizing the # Publish findings to Cloud Datahub.

3691

# results of the DlpJob will be applied to the entry for the resource scanned

3692

# in Cloud Data Catalog. Any labels previously written by another DlpJob will

3693

# be deleted. InfoType naming patterns are strictly enforced when using this

3694

# feature. Note that the findings will be persisted in Cloud Data Catalog

3695

# storage and are governed by Data Catalog service-specific policy, see

3696

# https://cloud.google.com/terms/service-terms

3697

# Only a single instance of this action can be specified and only allowed if

3698

# all resources being scanned are BigQuery tables.

3699

# Compatible with: Inspect

3700

},

3701

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification for project owners and editors on job's

3702

# completion/failure.

3703

# completion/failure.

3704

},

3705

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

3706

# message contains a single field, `DlpJobName`, which is equal to the

3707

# finished job's

3708

# [`DlpJob.name`](https://cloud.google.com/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

3709

# Compatible with: Inspect, Risk

3710

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

3711

# publishing access rights to the DLP API service account executing

3712

# the long running DlpJob sending the notifications.

3713

# Format is projects/{project}/topics/{topic}.

3714

},

3715

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

3716

# OutputStorageConfig. Only a single instance of this action can be

3717

# specified.

3718

# Compatible with: Inspect, Risk

3719

"outputConfig": { # Cloud repository for storing output. # Location to store findings outside of DLP.

3720

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

3721

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

3722

# from the `Finding` object. If appending to an existing table, any columns

3723

# from the predefined schema that are missing will be added. No columns in

3724

# the existing table will be deleted.

3725

#

3726

# If unspecified, then all available columns will be used for a new table or

3727

# an (existing) table with no schema, and no changes will be made to an

3728

# existing table that has a schema.

3729

# Only for use with external storage.

3730

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

3731

# dataset. If table_id is not set a new one will be generated

3732

# for you with the following format:

3733

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

3734

# generating the date details.

3735

#

3736

# For Inspect, each column in an existing output table must have the same

3737

# name, type, and mode of a field in the `Finding` object.

3738

#

3739

# For Risk, an existing output table should be the output of a previous

3740

# Risk analysis job run on the same source table, with the same privacy

3741

# metric and quasi-identifiers. Risk jobs that analyze the same table but

3742

# compute a different privacy metric, or use different sets of

3743

# quasi-identifiers, cannot store their results in the same table.

3744

# identified by its project_id, dataset_id, and table_name. Within a query

3745

# a table is often referenced with a string in the format of:

3746

# `<project_id>:<dataset_id>.<table_id>` or

3747

# `<project_id>.<dataset_id>.<table_id>`.

3748

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3749

# If omitted, project ID is inferred from the API call.

3750

"datasetId": "A String", # Dataset ID of the table.

3751

"tableId": "A String", # Name of the table.

},

},

},

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

3756

# Command Center (CSCC Alpha).

3757

# This action is only available for projects which are parts of

3758

# an organization and whitelisted for the alpha Cloud Security Command

3759

# Center.

3760

# The action will publish count of finding instances and their info types.

3761

# The summary of findings will be persisted in CSCC and are governed by CSCC

3762

# service-specific policy, see https://cloud.google.com/terms/service-terms

3763

# Only a single instance of this action can be specified.

3764

# Compatible with: Inspect

3765

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3766

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3767

],

3768

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

3769

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options.

3770

# bucket.

3771

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

3772

# number of bytes scanned is rounded down. Must be between 0 and 100,

3773

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

3774

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

3775

"fileTypes": [ # List of file type groups to include in the scan.

3776

# If empty, all files are scanned and available data format processors

3777

# are applied. In addition, the binary content of the selected files

3778

# is always scanned as well.

3779

# Images are scanned only as binary if the specified region

3780

# does not support image inspection and no file_types were specified.

3781

# Image inspection is restricted to 'global', 'us', 'asia', and 'europe'.

3782

"A String",

3783

],

3784

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

3785

# than this value then the rest of the bytes are omitted. Only one

3786

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

3787

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

3788

# Number of files scanned is rounded down. Must be between 0 and 100,

3789

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

3790

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

3791

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

3792

# `regex_file_set` must be set.

3793

# expressions are used to allow fine-grained control over which files in the

3794

# bucket to include.

3795

#

3796

# Included files are those that match at least one item in `include_regex` and

3797

# do not match any items in `exclude_regex`. Note that a file that matches

3798

# items from both lists will _not_ be included. For a match to occur, the

3799

# entire file path (i.e., everything in the url after the bucket name) must

3800

# match the regular expression.

3801

#

3802

# For example, given the input `{bucket_name: "mybucket", include_regex:

3803

# ["directory1/.*"], exclude_regex:

3804

# ["directory1/excluded.*"]}`:

3805

#

3806

# * `gs://mybucket/directory1/myfile` will be included

3807

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

3808

# across `/`)

3809

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

3810

# full path doesn't match any items in `include_regex`)

3811

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

3812

# matches an item in `exclude_regex`)

3813

#

3814

# If `include_regex` is left empty, it will match all files by default

3815

# (this is equivalent to setting `include_regex: [".*"]`).

3816

#

3817

# Some other common use cases:

3818

#

3819

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

3820

# files in `mybucket` except for .pdf files

3821

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

3822

# include all files directly under `gs://mybucket/directory/`, without matching

3823

# across `/`

3824

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

3825

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

3826

# the bucket that match at least one of these regular expressions will be

3827

# excluded from the scan.

3828

#

3829

# Regular expressions use RE2

3830

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

3831

# under the google/re2 repository on GitHub.

3832

"A String",

3833

],

3834

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

3835

# the bucket that match at least one of these regular expressions will be

3836

# included in the set of files, except for those that also match an item in

3837

# `exclude_regex`. Leaving this field empty will match all files by default

3838

# (this is equivalent to including `.*` in the list).

3839

#

3840

# Regular expressions use RE2

3841

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

3842

# under the google/re2 repository on GitHub.

"A String",

],

},

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

3847

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

3848

#

3849

# If the url ends in a trailing slash, the bucket or directory represented

3850

# by the url will be scanned non-recursively (content in sub-directories

3851

# will not be scanned). This means that `gs://mybucket/` is equivalent to

3852

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

3853

# `gs://mybucket/directory/*`.

3854

#

3855

# Exactly one of `url` or `regex_file_set` must be set.

3856

},

3857

"sampleMethod": "A String",

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3858

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3859

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options.

3860

"sampleMethod": "A String",

3861

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

3862

# identified by its project_id, dataset_id, and table_name. Within a query

3863

# a table is often referenced with a string in the format of:

3864

# `<project_id>:<dataset_id>.<table_id>` or

3865

# `<project_id>.<dataset_id>.<table_id>`.

3866

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3867

# If omitted, project ID is inferred from the API call.

3868

"datasetId": "A String", # Dataset ID of the table.

3869

"tableId": "A String", # Name of the table.

3870

},

3871

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

3872

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

3873

# 100 means no limit. Defaults to 0. Only one of rows_limit and

3874

# rows_limit_percent can be specified. Cannot be used in conjunction with

3875

# TimespanConfig.

3876

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

3877

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

3878

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

3879

# Cannot be used in conjunction with TimespanConfig.

3880

"identifyingFields": [ # Table fields that may uniquely identify a row within the table. When

3881

# `actions.saveFindings.outputConfig.table` is specified, the values of

3882

# columns specified here are available in the output table under

3883

# `location.content_locations.record_location.record_key.id_values`. Nested

3884

# fields such as `person.birthdate.year` are allowed.

3885

{ # General identifier of a data field in a storage service.

3886

"name": "A String", # Name describing the field.

3887

},

3888

],

3889

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

3890

# inspection of entire columns which you know have no findings.

3891

{ # General identifier of a data field in a storage service.

3892

"name": "A String", # Name describing the field.

},

],

},

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

3897

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

3898

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

3899

# Used for data sources like Datastore and BigQuery.

3900

#

3901

# For BigQuery:

3902

# Required to filter out rows based on the given start and

3903

# end times. If not specified and the table was modified between the given

3904

# start and end times, the entire table will be scanned.

3905

# The valid data types of the timestamp field are: `INTEGER`, `DATE`,

3906

# `TIMESTAMP`, or `DATETIME` BigQuery column.

3907

#

3908

# For Datastore.

3909

# Valid data types of the timestamp field are: `TIMESTAMP`.

3910

# Datastore entity will be scanned if the timestamp property does not

3911

# exist or its value is empty or invalid.

3912

"name": "A String", # Name describing the field.

3913

},

3914

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

3915

# a valid start_time to avoid scanning files that have not been modified

3916

# since the last time the JobTrigger executed. This will be based on the

3917

# time of the execution of the last run of the JobTrigger.

3918

"startTime": "A String", # Exclude files or rows older than this value.

3919

"endTime": "A String", # Exclude files or rows newer than this value.

3920

# If set to zero, no upper time limit is applied.

3921

},

3922

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options.

3923

"kind": { # A representation of a Datastore kind. # The kind to process.

3924

"name": "A String", # The name of the kind.

3925

},

3926

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

3927

# by project and namespace, however the namespace ID may be empty.

3928

# A partition ID identifies a grouping of entities. The grouping is always

3929

# by project and namespace, however the namespace ID may be empty.

3930

#

3931

# A partition ID contains several dimensions:

3932

# project ID and namespace ID.

3933

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

3934

"projectId": "A String", # The ID of the project to which the entities belong.

3935

},

3936

},

3937

"hybridOptions": { # Configuration to control jobs where the content being inspected is outside # Hybrid inspection options.

3938

# Early access feature is in a pre-release state and might change or have

3939

# limited support. For more information, see

3940

# https://cloud.google.com/products#product-launch-stages.

3941

# of Google Cloud Platform.

3942

"tableOptions": { # Instructions regarding the table content being inspected. # If the container is a table, additional information to make findings

3943

# meaningful such as the columns that are primary keys.

3944

"identifyingFields": [ # The columns that are the primary keys for table objects included in

3945

# ContentItem. A copy of this cell's value will stored alongside alongside

3946

# each finding so that the finding can be traced to the specific row it came

3947

# from. No more than 3 may be provided.

3948

{ # General identifier of a data field in a storage service.

3949

"name": "A String", # Name describing the field.

},

],

},

"requiredFindingLabelKeys": [ # These are labels that each inspection request must include within their

3954

# 'finding_labels' map. Request may contain others, but any missing one of

3955

# these will be rejected.

3956

#

3957

# Label keys must be between 1 and 63 characters long and must conform

3958

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

3959

#

3960

# No more than 10 keys can be required.

3961

"A String",

3962

],

3963

"labels": { # To organize findings, these labels will be added to each finding.

3964

#

3965

# Label keys must be between 1 and 63 characters long and must conform

3966

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

3967

#

3968

# Label values must be between 0 and 63 characters long and must conform

3969

# to the regular expression `([a-z]([-a-z0-9]*[a-z0-9])?)?`.

3970

#

3971

# No more than 10 labels can be associated with a given finding.

3972

#

3973

# Examples:

3974

# * `"environment" : "production"`

3975

# * `"pipeline" : "etl"`

3976

"a_key": "A String",

3977

},

3978

"description": "A String", # A short description of where the data is coming from. Will be stored once

3979

# in the job. 256 max length.

3980

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

3981

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

3982

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

3983

# When used with redactContent only info_types and min_likelihood are currently

3984

# used.

3985

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

3986

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

3987

{ # Custom information type provided by the user. Used to find domain-specific

3988

# sensitive information configurable to the data in question.

3989

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

3990

# be used to match sensitive information specific to the data, such as a list

3991

# of employee IDs or job titles.

3992

#

3993

# Dictionary words are case-insensitive and all characters other than letters

3994

# and digits in the unicode [Basic Multilingual

3995

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

3996

# will be replaced with whitespace when scanning for matches, so the

3997

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

3998

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

3999

# surrounding any match must be of a different type than the adjacent

4000

# characters within the word, so letters must be next to non-letters and

4001

# digits next to non-digits. For example, the dictionary word "jen" will

4002

# match the first three letters of the text "jen123" but will return no

4003

# matches for "jennifer".

4004

#

4005

# Dictionary words containing a large number of characters that are not

4006

# letters or digits may result in unexpected findings because such characters

4007

# are treated as whitespace. The

4008

# [limits](https://cloud.google.com/dlp/limits) page contains details about

4009

# the size limits of dictionaries. For dictionaries that do not fit within

4010

# these constraints, consider using `LargeCustomDictionaryConfig` in the

4011

# `StoredInfoType` API.

4012

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

4013

# is accepted.

4014

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

4015

# Example: gs://[BUCKET_NAME]/dictionary.txt

4016

},

4017

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

4018

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

4019

# at least one phrase and every phrase must contain at least 2 characters

4020

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

4026

# infoType, when the name matches one of existing infoTypes and that infoType

4027

# is specified in `InspectContent.info_types` field. Specifying the latter

4028

# adds findings to the one detected by the system. If built-in info type is

4029

# not specified in `InspectContent.info_types` list then the name is treated

4030

# as a custom info type.

4031

"name": "A String", # Name of the information type. Either a name of your choosing when

4032

# creating a CustomInfoType, or one of the names listed

4033

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4034

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4035

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4036

},

4037

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

4038

# altered by a detection rule if the finding meets the criteria specified by

4039

# the rule. Defaults to `VERY_LIKELY` if not specified.

4040

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

4041

# Rules are applied in order that they are specified. Not supported for the

4042

# `surrogate_type` CustomInfoType.

4043

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

4044

# `CustomInfoType` to alter behavior under certain circumstances, depending

4045

# on the specific details of the rule. Not supported for the `surrogate_type`

4046

# custom infoType.

4047

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

4048

# proximity of hotwords.

4049

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

4050

# The total length of the window cannot exceed 1000 characters. Note that

4051

# the finding itself will be included in the window, so that hotwords may

4052

# be used to match substrings of the finding itself. For example, the

4053

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

4054

# adjusted upwards if the area code is known to be the local area code of

4055

# a company office using the hotword regex "$xxx$", where "xxx"

4056

# is the area code in question.

4057

# rule.

4058

"windowAfter": 42, # Number of characters after the finding to consider.

4059

"windowBefore": 42, # Number of characters before the finding to consider.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4060

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4061

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

4062

# part of a detection rule.

4063

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

4064

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

4065

# levels. For example, if a finding would be `POSSIBLE` without the

4066

# detection rule and `relative_likelihood` is 1, then it is upgraded to

4067

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

4068

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

4069

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

4070

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

4071

# a final likelihood of `LIKELY`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4072

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4073

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

4074

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4075

# specified, the entire match is returned. No more than 3 may be included.

4076

42,

4077

],

4078

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4079

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4080

# google/re2 repository on GitHub.

4081

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4082

},

4083

},

4084

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4085

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

4086

# support reversing.

4087

# such as

4088

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

4089

# These types of transformations are

4090

# those that perform pseudonymization, thereby producing a "surrogate" as

4091

# output. This should be used in conjunction with a field on the

4092

# transformation such as `surrogate_info_type`. This CustomInfoType does

4093

# not support the use of `detection_rules`.

4094

},

4095

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

4096

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4097

# specified, the entire match is returned. No more than 3 may be included.

4098

42,

4099

],

4100

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4101

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4102

# google/re2 repository on GitHub.

4103

},

4104

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

4105

# `InspectDataSource`. Not currently supported in `InspectContent`.

4106

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

4107

# `organizations/433245324/storedInfoTypes/432452342` or

4108

# `projects/project-id/storedInfoTypes/432452342`.

4109

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

4110

# inspection was created. Output-only field, populated by the system.

4111

},

4112

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

4113

# to be returned. It still can be used for rules matching.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4114

},

4115

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4116

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

4117

# POSSIBLE.

4118

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

4119

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

4120

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

4121

# When set within `InspectContentRequest`, the maximum returned is 2000

4122

# regardless if this is set higher.

4123

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

4124

{ # Max findings configuration per infoType, per content item or long

4125

# running DlpJob.

4126

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

4127

# info_type should be provided. If InfoTypeLimit does not have an

4128

# info_type, the DLP API applies the limit against all info_types that

4129

# are found but not specified in another InfoTypeLimit.

4130

"name": "A String", # Name of the information type. Either a name of your choosing when

4131

# creating a CustomInfoType, or one of the names listed

4132

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4133

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4134

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4135

},

4136

"maxFindings": 42, # Max findings limit for the given infoType.

4137

},

4138

],

4139

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

4140

# When set within `InspectJobConfig`,

4141

# the maximum returned is 2000 regardless if this is set higher.

4142

# When set within `InspectContentRequest`, this field is ignored.

4143

},

4144

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

4145

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

4146

# included in the response; see Finding.quote.

4147

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

4148

# Exclusion rules, contained in the set are executed in the end, other

4149

# rules are executed in the order they are specified for each info type.

4150

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

4151

# circumstances, depending on the specific details of the rules within the set.

4152

"infoTypes": [ # List of infoTypes this rule set is applied to.

4153

{ # Type of information detected by the API.

4154

"name": "A String", # Name of the information type. Either a name of your choosing when

4155

# creating a CustomInfoType, or one of the names listed

4156

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4157

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4158

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4159

},

4160

],

4161

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

4162

{ # A single inspection rule to be applied to infoTypes, specified in

4163

# `InspectionRuleSet`.

4164

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

4165

# proximity of hotwords.

4166

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

4167

# The total length of the window cannot exceed 1000 characters. Note that

4168

# the finding itself will be included in the window, so that hotwords may

4169

# be used to match substrings of the finding itself. For example, the

4170

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

4171

# adjusted upwards if the area code is known to be the local area code of

4172

# a company office using the hotword regex "$xxx$", where "xxx"

4173

# is the area code in question.

4174

# rule.

4175

"windowAfter": 42, # Number of characters after the finding to consider.

4176

"windowBefore": 42, # Number of characters before the finding to consider.

4177

},

4178

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

4179

# part of a detection rule.

4180

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

4181

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

4182

# levels. For example, if a finding would be `POSSIBLE` without the

4183

# detection rule and `relative_likelihood` is 1, then it is upgraded to

4184

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

4185

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

4186

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

4187

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

4188

# a final likelihood of `LIKELY`.

4189

},

4190

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

4191

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4192

# specified, the entire match is returned. No more than 3 may be included.

4193

42,

4194

],

4195

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4196

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4197

# google/re2 repository on GitHub.

4198

},

4199

},

4200

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

4201

# `InspectionRuleSet` are removed from results.

4202

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

4203

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

4204

# be used to match sensitive information specific to the data, such as a list

4205

# of employee IDs or job titles.

4206

#

4207

# Dictionary words are case-insensitive and all characters other than letters

4208

# and digits in the unicode [Basic Multilingual

4209

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

4210

# will be replaced with whitespace when scanning for matches, so the

4211

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

4212

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

4213

# surrounding any match must be of a different type than the adjacent

4214

# characters within the word, so letters must be next to non-letters and

4215

# digits next to non-digits. For example, the dictionary word "jen" will

4216

# match the first three letters of the text "jen123" but will return no

4217

# matches for "jennifer".

4218

#

4219

# Dictionary words containing a large number of characters that are not

4220

# letters or digits may result in unexpected findings because such characters

4221

# are treated as whitespace. The

4222

# [limits](https://cloud.google.com/dlp/limits) page contains details about

4223

# the size limits of dictionaries. For dictionaries that do not fit within

4224

# these constraints, consider using `LargeCustomDictionaryConfig` in the

4225

# `StoredInfoType` API.

4226

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

4227

# is accepted.

4228

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

4229

# Example: gs://[BUCKET_NAME]/dictionary.txt

4230

},

4231

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

4232

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

4233

# at least one phrase and every phrase must contain at least 2 characters

4234

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

4240

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

4241

# contained within with a finding of an infoType from this list. For

4242

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

4243

# `exclusion_rule` containing `exclude_info_types.info_types` with

4244

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

4245

# with EMAIL_ADDRESS finding.

4246

# That leads to "555-222-2222@example.org" to generate only a single

4247

# finding, namely email address.

4248

{ # Type of information detected by the API.

4249

"name": "A String", # Name of the information type. Either a name of your choosing when

4250

# creating a CustomInfoType, or one of the names listed

4251

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4252

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4253

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

4258

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4259

# specified, the entire match is returned. No more than 3 may be included.

4260

42,

4261

],

4262

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4263

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4264

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

4272

# If empty, text, images, and other content will be included.

4273

"A String",

4274

],

4275

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

4276

# InfoType values returned by ListInfoTypes or listed at

4277

# https://cloud.google.com/dlp/docs/infotypes-reference.

4278

#

4279

# When no InfoTypes or CustomInfoTypes are specified in a request, the

4280

# system may automatically choose what detectors to run. By default this may

4281

# be all types, but may change over time as detectors are updated.

4282

#

4283

# If you need precise control and predictability as to what detectors are

4284

# run you should specify specific InfoTypes listed in the reference,

4285

# otherwise a default list will be used, which may change over time.

4286

{ # Type of information detected by the API.

4287

"name": "A String", # Name of the information type. Either a name of your choosing when

4288

# creating a CustomInfoType, or one of the names listed

4289

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4290

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4291

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

},

"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of

4297

# this run.

4298

# to be detected) to be used anywhere you otherwise would normally specify

4299

# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates

4300

# to learn more.

4301

"description": "A String", # Short description (max 256 chars).

4302

"displayName": "A String", # Display name (max 256 chars).

4303

"createTime": "A String", # Output only. The creation timestamp of an inspectTemplate.

4304

"updateTime": "A String", # Output only. The last update timestamp of an inspectTemplate.

4305

"name": "A String", # Output only. The template name.

4306

#

4307

# The template will have one of the following formats:

4308

# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR

4309

# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`;

4310

"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.

4311

# When used with redactContent only info_types and min_likelihood are currently

4312

# used.

4313

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

4314

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

4315

{ # Custom information type provided by the user. Used to find domain-specific

4316

# sensitive information configurable to the data in question.

4317

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

4318

# be used to match sensitive information specific to the data, such as a list

4319

# of employee IDs or job titles.

4320

#

4321

# Dictionary words are case-insensitive and all characters other than letters

4322

# and digits in the unicode [Basic Multilingual

4323

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

4324

# will be replaced with whitespace when scanning for matches, so the

4325

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

4326

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

4327

# surrounding any match must be of a different type than the adjacent

4328

# characters within the word, so letters must be next to non-letters and

4329

# digits next to non-digits. For example, the dictionary word "jen" will

4330

# match the first three letters of the text "jen123" but will return no

4331

# matches for "jennifer".

4332

#

4333

# Dictionary words containing a large number of characters that are not

4334

# letters or digits may result in unexpected findings because such characters

4335

# are treated as whitespace. The

4336

# [limits](https://cloud.google.com/dlp/limits) page contains details about

4337

# the size limits of dictionaries. For dictionaries that do not fit within

4338

# these constraints, consider using `LargeCustomDictionaryConfig` in the

4339

# `StoredInfoType` API.

4340

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

4341

# is accepted.

4342

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

4343

# Example: gs://[BUCKET_NAME]/dictionary.txt

4344

},

4345

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

4346

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

4347

# at least one phrase and every phrase must contain at least 2 characters

4348

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

4354

# infoType, when the name matches one of existing infoTypes and that infoType

4355

# is specified in `InspectContent.info_types` field. Specifying the latter

4356

# adds findings to the one detected by the system. If built-in info type is

4357

# not specified in `InspectContent.info_types` list then the name is treated

4358

# as a custom info type.

4359

"name": "A String", # Name of the information type. Either a name of your choosing when

4360

# creating a CustomInfoType, or one of the names listed

4361

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4362

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4363

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4364

},

4365

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

4366

# altered by a detection rule if the finding meets the criteria specified by

4367

# the rule. Defaults to `VERY_LIKELY` if not specified.

4368

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

4369

# Rules are applied in order that they are specified. Not supported for the

4370

# `surrogate_type` CustomInfoType.

4371

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

4372

# `CustomInfoType` to alter behavior under certain circumstances, depending

4373

# on the specific details of the rule. Not supported for the `surrogate_type`

4374

# custom infoType.

4375

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

4376

# proximity of hotwords.

4377

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

4378

# The total length of the window cannot exceed 1000 characters. Note that

4379

# the finding itself will be included in the window, so that hotwords may

4380

# be used to match substrings of the finding itself. For example, the

4381

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

4382

# adjusted upwards if the area code is known to be the local area code of

4383

# a company office using the hotword regex "$xxx$", where "xxx"

4384

# is the area code in question.

4385

# rule.

4386

"windowAfter": 42, # Number of characters after the finding to consider.

4387

"windowBefore": 42, # Number of characters before the finding to consider.

4388

},

4389

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

4390

# part of a detection rule.

4391

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

4392

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

4393

# levels. For example, if a finding would be `POSSIBLE` without the

4394

# detection rule and `relative_likelihood` is 1, then it is upgraded to

4395

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

4396

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

4397

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

4398

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

4399

# a final likelihood of `LIKELY`.

4400

},

4401

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

4402

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4403

# specified, the entire match is returned. No more than 3 may be included.

4404

42,

4405

],

4406

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4407

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4408

# google/re2 repository on GitHub.

},

},

},

],

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

4414

# support reversing.

4415

# such as

4416

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

4417

# These types of transformations are

4418

# those that perform pseudonymization, thereby producing a "surrogate" as

4419

# output. This should be used in conjunction with a field on the

4420

# transformation such as `surrogate_info_type`. This CustomInfoType does

4421

# not support the use of `detection_rules`.

4422

},

4423

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

4424

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4425

# specified, the entire match is returned. No more than 3 may be included.

4426

42,

4427

],

4428

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4429

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4430

# google/re2 repository on GitHub.

4431

},

4432

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

4433

# `InspectDataSource`. Not currently supported in `InspectContent`.

4434

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

4435

# `organizations/433245324/storedInfoTypes/432452342` or

4436

# `projects/project-id/storedInfoTypes/432452342`.

4437

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

4438

# inspection was created. Output-only field, populated by the system.

4439

},

4440

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

4441

# to be returned. It still can be used for rules matching.

4442

},

4443

],

4444

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

4445

# POSSIBLE.

4446

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

4447

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

4448

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

4449

# When set within `InspectContentRequest`, the maximum returned is 2000

4450

# regardless if this is set higher.

4451

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

4452

{ # Max findings configuration per infoType, per content item or long

4453

# running DlpJob.

4454

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

4455

# info_type should be provided. If InfoTypeLimit does not have an

4456

# info_type, the DLP API applies the limit against all info_types that

4457

# are found but not specified in another InfoTypeLimit.

4458

"name": "A String", # Name of the information type. Either a name of your choosing when

4459

# creating a CustomInfoType, or one of the names listed

4460

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4461

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4462

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4463

},

4464

"maxFindings": 42, # Max findings limit for the given infoType.

4465

},

4466

],

4467

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

4468

# When set within `InspectJobConfig`,

4469

# the maximum returned is 2000 regardless if this is set higher.

4470

# When set within `InspectContentRequest`, this field is ignored.

4471

},

4472

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

4473

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

4474

# included in the response; see Finding.quote.

4475

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

4476

# Exclusion rules, contained in the set are executed in the end, other

4477

# rules are executed in the order they are specified for each info type.

4478

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

4479

# circumstances, depending on the specific details of the rules within the set.

4480

"infoTypes": [ # List of infoTypes this rule set is applied to.

4481

{ # Type of information detected by the API.

4482

"name": "A String", # Name of the information type. Either a name of your choosing when

4483

# creating a CustomInfoType, or one of the names listed

4484

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4485

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4486

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4487

},

4488

],

4489

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

4490

{ # A single inspection rule to be applied to infoTypes, specified in

4491

# `InspectionRuleSet`.

4492

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

4493

# proximity of hotwords.

4494

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

4495

# The total length of the window cannot exceed 1000 characters. Note that

4496

# the finding itself will be included in the window, so that hotwords may

4497

# be used to match substrings of the finding itself. For example, the

4498

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

4499

# adjusted upwards if the area code is known to be the local area code of

4500

# a company office using the hotword regex "$xxx$", where "xxx"

4501

# is the area code in question.

4502

# rule.

4503

"windowAfter": 42, # Number of characters after the finding to consider.

4504

"windowBefore": 42, # Number of characters before the finding to consider.

4505

},

4506

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

4507

# part of a detection rule.

4508

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

4509

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

4510

# levels. For example, if a finding would be `POSSIBLE` without the

4511

# detection rule and `relative_likelihood` is 1, then it is upgraded to

4512

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

4513

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

4514

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

4515

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

4516

# a final likelihood of `LIKELY`.

4517

},

4518

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

4519

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4520

# specified, the entire match is returned. No more than 3 may be included.

4521

42,

4522

],

4523

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4524

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4525

# google/re2 repository on GitHub.

4526

},

4527

},

4528

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

4529

# `InspectionRuleSet` are removed from results.

4530

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

4531

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

4532

# be used to match sensitive information specific to the data, such as a list

4533

# of employee IDs or job titles.

4534

#

4535

# Dictionary words are case-insensitive and all characters other than letters

4536

# and digits in the unicode [Basic Multilingual

4537

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

4538

# will be replaced with whitespace when scanning for matches, so the

4539

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

4540

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

4541

# surrounding any match must be of a different type than the adjacent

4542

# characters within the word, so letters must be next to non-letters and

4543

# digits next to non-digits. For example, the dictionary word "jen" will

4544

# match the first three letters of the text "jen123" but will return no

4545

# matches for "jennifer".

4546

#

4547

# Dictionary words containing a large number of characters that are not

4548

# letters or digits may result in unexpected findings because such characters

4549

# are treated as whitespace. The

4550

# [limits](https://cloud.google.com/dlp/limits) page contains details about

4551

# the size limits of dictionaries. For dictionaries that do not fit within

4552

# these constraints, consider using `LargeCustomDictionaryConfig` in the

4553

# `StoredInfoType` API.

4554

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

4555

# is accepted.

4556

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

4557

# Example: gs://[BUCKET_NAME]/dictionary.txt

4558

},

4559

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

4560

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

4561

# at least one phrase and every phrase must contain at least 2 characters

4562

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

4568

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

4569

# contained within with a finding of an infoType from this list. For

4570

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

4571

# `exclusion_rule` containing `exclude_info_types.info_types` with

4572

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

4573

# with EMAIL_ADDRESS finding.

4574

# That leads to "555-222-2222@example.org" to generate only a single

4575

# finding, namely email address.

4576

{ # Type of information detected by the API.

4577

"name": "A String", # Name of the information type. Either a name of your choosing when

4578

# creating a CustomInfoType, or one of the names listed

4579

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4580

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4581

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

4586

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4587

# specified, the entire match is returned. No more than 3 may be included.

4588

42,

4589

],

4590

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4591

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4592

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

4600

# If empty, text, images, and other content will be included.

4601

"A String",

4602

],

4603

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

4604

# InfoType values returned by ListInfoTypes or listed at

4605

# https://cloud.google.com/dlp/docs/infotypes-reference.

4606

#

4607

# When no InfoTypes or CustomInfoTypes are specified in a request, the

4608

# system may automatically choose what detectors to run. By default this may

4609

# be all types, but may change over time as detectors are updated.

4610

#

4611

# If you need precise control and predictability as to what detectors are

4612

# run you should specify specific InfoTypes listed in the reference,

4613

# otherwise a default list will be used, which may change over time.

4614

{ # Type of information detected by the API.

4615

"name": "A String", # Name of the information type. Either a name of your choosing when

4616

# creating a CustomInfoType, or one of the names listed

4617

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4618

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4619

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

},

},

"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.

4626

"hybridStats": { # Statistics related to processing hybrid inspect requests. # Statistics related to the processing of hybrid inspect.

4627

# Early access feature is in a pre-release state and might change or have

4628

# limited support. For more information, see

4629

# https://cloud.google.com/products#product-launch-stages.

4630

"processedCount": "A String", # The number of hybrid inspection requests processed within this job.

4631

"abortedCount": "A String", # The number of hybrid inspection requests aborted because the job ran

4632

# out of quota or was ended before they could be processed.

4633

"pendingCount": "A String", # The number of hybrid requests currently being processed. Only populated

4634

# when called via method `getDlpJob`.

4635

# A burst of traffic may cause hybrid inspect requests to be enqueued.

4636

# Processing will take place as quickly as possible, but resource limitations

4637

# may impact how long a request is enqueued for.

4638

},

4639

"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.

4640

"infoTypeStats": [ # Statistics of how many instances of each info type were found during

4641

# inspect job.

4642

{ # Statistics regarding a specific InfoType.

4643

"infoType": { # Type of information detected by the API. # The type of finding this stat is for.

4644

"name": "A String", # Name of the information type. Either a name of your choosing when

4645

# creating a CustomInfoType, or one of the names listed

4646

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4647

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

4648

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

4649

},

4650

"count": "A String", # Number of findings for this infoType.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4651

},

4652

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4653

"processedBytes": "A String", # Total size in bytes that were processed.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4654

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4655

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4656

"name": "A String", # The server-assigned name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4661

<code class="details" id="list">list(parent, orderBy=None, type=None, filter=None, pageToken=None, locationId=None, pageSize=None, x__xgafv=None)</code>

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4662

<pre>Lists DlpJobs that match the specified filter in the request.

4663

See https://cloud.google.com/dlp/docs/inspecting-storage and

4664

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

4665

4666

Args:

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4667

parent: string, Required. Parent resource name.

4668

- Format:projects/[PROJECT-ID]

4669

- Format:projects/[PROJECT-ID]/locations/[LOCATION-ID] (required)

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4670

orderBy: string, Comma separated list of fields to order by,

4671

followed by `asc` or `desc` postfix. This list is case-insensitive,

4672

default sorting order is ascending, redundant space characters are

4673

insignificant.

4674

4675

Example: `name asc, end_time asc, create_time desc`

4676

4677

Supported fields are:

4678

4679

- `create_time`: corresponds to time the job was created.

4680

- `end_time`: corresponds to time the job ended.

4681

- `name`: corresponds to job's name.

4682

- `state`: corresponds to `state`

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4683

type: string, The type of job. Defaults to `DlpJobType.INSPECT`

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4684

filter: string, Allows filtering.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

Supported syntax:

* Filter expressions are made up of one or more restrictions.

4689

* Restrictions can be combined by `AND` or `OR` logical operators. A

4690

sequence of restrictions implicitly uses `AND`.

Dan O'Meara

2020-05-01 07:42:23 -0700

[diff] [blame]

4691

* A restriction has the form of `{field} {operator} {value}`.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4692

* Supported fields/values for inspect jobs:

4693

- `state` - PENDING|RUNNING|CANCELED|FINISHED|FAILED

4694

- `inspected_storage` - DATASTORE|CLOUD_STORAGE|BIGQUERY

4695

- `trigger_name` - The resource name of the trigger that created job.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4696

- 'end_time` - Corresponds to time the job finished.

4697

- 'start_time` - Corresponds to time the job finished.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4698

* Supported fields for risk analysis jobs:

4699

- `state` - RUNNING|CANCELED|FINISHED|FAILED

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4700

- 'end_time` - Corresponds to time the job finished.

4701

- 'start_time` - Corresponds to time the job finished.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4702

* The operator must be `=` or `!=`.

Examples:

* inspected_storage = cloud_storage AND state = done

4707

* inspected_storage = cloud_storage OR inspected_storage = bigquery

4708

* inspected_storage = cloud_storage AND (state = done OR state = canceled)

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4709

* end_time > \"2017-12-12T00:00:00+00:00\"

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4710

4711

The length of this field should be no more than 500 characters.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4712

pageToken: string, The standard list page token.

4713

locationId: string, Deprecated. This field has no effect.

4714

pageSize: integer, The standard list page size.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4715

x__xgafv: string, V1 error format.

4716

Allowed values

4717

1 - v1 error format

4718

2 - v2 error format

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4719

4720

Returns:

4721

An object of the form:

4722

4723

{ # The response message for listing DLP jobs.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4724

"jobs": [ # A list of DlpJobs that matches the specified filter in the request.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

4725

{ # Combines all of the information about a DLP job.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4726

"errors": [ # A stream of errors encountered running the job.

4727

{ # Details information about an error encountered during job execution or

4728

# the results of an unsuccessful activation of the JobTrigger.

4729

"timestamps": [ # The times the error occurred.

4730

"A String",

4731

],

4732

"details": { # The `Status` type defines a logical error model that is suitable for # Detailed error codes and messages.

4733

# different programming environments, including REST APIs and RPC APIs. It is

4734

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

4735

# three pieces of data: error code, error message, and error details.

4736

#

4737

# You can find out more about this error model and how to work with it in the

4738

# [API Design Guide](https://cloud.google.com/apis/design/errors).

4739

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

4740

"details": [ # A list of messages that carry the error details. There is a common set of

4741

# message types for APIs to use.

4742

{

4743

"a_key": "", # Properties of the object. Contains field @type with type URL.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4744

},

4745

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4746

"message": "A String", # A developer-facing error message, which should be in English. Any

4747

# user-facing error message should be localized and sent in the

4748

# google.rpc.Status.details field, or localized by the client.

},

},

],

"createTime": "A String", # Time when the job was created.

4753

"state": "A String", # State of a job.

4754

"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.

4755

"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an # K-map result

4756

# estimation, not exact values.

4757

"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value

4758

# doesn't correspond to any such interval, the associated frequency is

4759

# zero. For example, the following records:

4760

# {min_anonymity: 1, max_anonymity: 1, frequency: 17}

4761

# {min_anonymity: 2, max_anonymity: 3, frequency: 42}

4762

# {min_anonymity: 5, max_anonymity: 10, frequency: 99}

4763

# mean that there are no record with an estimated anonymity of 4, 5, or

4764

# larger than 10.

4765

{ # A KMapEstimationHistogramBucket message with the following values:

# min_anonymity: 3

# max_anonymity: 5

# frequency: 42

# means that there are 42 records whose quasi-identifier values correspond

4770

# to 3, 4 or 5 people in the overlying population. An important particular

4771

# case is when min_anonymity = max_anonymity = 1: the frequency field then

4772

# corresponds to the number of uniquely identifiable records.

4773

"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.

4774

"bucketSize": "A String", # Number of records within these anonymity bounds.

4775

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

4776

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

4777

# number of classes returned per bucket is capped at 20.

4778

{ # A tuple of values for the quasi-identifier columns.

4779

"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.

4780

"quasiIdsValues": [ # The quasi-identifier values.

4781

{ # Set of primitive values supported by the system.

4782

# Note that for the purposes of inspection or transformation, the number

4783

# of bytes considered to comprise a 'Value' is based on its representation

4784

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

4785

# 123456789, the number of bytes would be counted as 9, even though an

4786

# int64 only holds up to 8 bytes of data.

4787

"integerValue": "A String", # integer

4788

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

4789

# or are specified elsewhere. An API may choose to allow leap seconds. Related

4790

# types are google.type.Date and `google.protobuf.Timestamp`.

4791

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

4792

# allow the value 60 if it allows leap-seconds.

4793

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

4794

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

4795

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

4796

# to allow the value "24:00:00" for scenarios like business closing time.

4797

},

4798

"dayOfWeekValue": "A String", # day of week

4799

"floatValue": 3.14, # float

4800

"stringValue": "A String", # string

4801

"timestampValue": "A String", # timestamp

4802

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

4803

# and time zone are either specified elsewhere or are not significant. The date

4804

# is relative to the Proleptic Gregorian Calendar. This can represent:

4805

#

4806

# * A full date, with non-zero year, month and day values

4807

# * A month and day value, with a zero year, e.g. an anniversary

4808

# * A year on its own, with zero month and day values

4809

# * A year and month value, with a zero day, e.g. a credit card expiration date

4810

#

4811

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

4812

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

4813

# month and day.

4814

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

4815

# a year.

4816

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

4817

# if specifying a year by itself or a year and month where the day is not

4818

# significant.

4819

},

4820

"booleanValue": True or False, # boolean

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

4821

},

4822

],

4823

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4824

],

4825

"minAnonymity": "A String", # Always positive.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

4826

},

4827

],

4828

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

4829

"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an # Delta-presence result

4830

# estimation, not exact values.

4831

"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a

4832

# value doesn't correspond to any such interval, the associated frequency

4833

# is zero. For example, the following records:

4834

# {min_probability: 0, max_probability: 0.1, frequency: 17}

4835

# {min_probability: 0.2, max_probability: 0.3, frequency: 42}

4836

# {min_probability: 0.3, max_probability: 0.4, frequency: 99}

4837

# mean that there are no record with an estimated probability in [0.1, 0.2)

4838

# nor larger or equal to 0.4.

4839

{ # A DeltaPresenceEstimationHistogramBucket message with the following

4840

# values:

4841

# min_probability: 0.1

4842

# max_probability: 0.2

4843

# frequency: 42

4844

# means that there are 42 records for which δ is in [0.1, 0.2). An

4845

# important particular case is when min_probability = max_probability = 1:

4846

# then, every individual who shares this quasi-identifier combination is in

4847

# the dataset.

4848

"maxProbability": 3.14, # Always greater than or equal to min_probability.

4849

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

4850

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

4851

# number of classes returned per bucket is capped at 20.

4852

{ # A tuple of values for the quasi-identifier columns.

4853

"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these

4854

# quasi-identifier values is in the dataset. This value, typically called

4855

# δ, is the ratio between the number of records in the dataset with these

4856

# quasi-identifier values, and the total number of individuals (inside

4857

# *and* outside the dataset) with these quasi-identifier values.

4858

# For example, if there are 15 individuals in the dataset who share the

4859

# same quasi-identifier values, and an estimated 100 people in the entire

4860

# population with these values, then δ is 0.15.

4861

"quasiIdsValues": [ # The quasi-identifier values.

4862

{ # Set of primitive values supported by the system.

4863

# Note that for the purposes of inspection or transformation, the number

4864

# of bytes considered to comprise a 'Value' is based on its representation

4865

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

4866

# 123456789, the number of bytes would be counted as 9, even though an

4867

# int64 only holds up to 8 bytes of data.

4868

"integerValue": "A String", # integer

4869

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

4870

# or are specified elsewhere. An API may choose to allow leap seconds. Related

4871

# types are google.type.Date and `google.protobuf.Timestamp`.

4872

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

4873

# allow the value 60 if it allows leap-seconds.

4874

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

4875

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

4876

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

4877

# to allow the value "24:00:00" for scenarios like business closing time.

4878

},

4879

"dayOfWeekValue": "A String", # day of week

4880

"floatValue": 3.14, # float

4881

"stringValue": "A String", # string

4882

"timestampValue": "A String", # timestamp

4883

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

4884

# and time zone are either specified elsewhere or are not significant. The date

4885

# is relative to the Proleptic Gregorian Calendar. This can represent:

4886

#

4887

# * A full date, with non-zero year, month and day values

4888

# * A month and day value, with a zero year, e.g. an anniversary

4889

# * A year on its own, with zero month and day values

4890

# * A year and month value, with a zero day, e.g. a credit card expiration date

4891

#

4892

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

4893

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

4894

# month and day.

4895

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

4896

# a year.

4897

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

4898

# if specifying a year by itself or a year and month where the day is not

4899

# significant.

4900

},

4901

"booleanValue": True or False, # boolean

},

],

},

],

"minProbability": 3.14, # Between 0 and 1.

4907

"bucketSize": "A String", # Number of records within these probability bounds.

},

],

},

"categoricalStatsResult": { # Result of the categorical stats computation. # Categorical stats result

4912

"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.

4913

{ # Histogram of value frequencies in the column.

4914

"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.

4915

"bucketValueCount": "A String", # Total number of distinct values in this bucket.

4916

"bucketSize": "A String", # Total number of values in this bucket.

4917

"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.

4918

"bucketValues": [ # Sample of value frequencies in this bucket. The total number of

4919

# values returned per bucket is capped at 20.

4920

{ # A value of a field, including its frequency.

4921

"count": "A String", # How many times the value is contained in the field.

4922

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

4923

# Note that for the purposes of inspection or transformation, the number

4924

# of bytes considered to comprise a 'Value' is based on its representation

4925

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

4926

# 123456789, the number of bytes would be counted as 9, even though an

4927

# int64 only holds up to 8 bytes of data.

4928

"integerValue": "A String", # integer

4929

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

4930

# or are specified elsewhere. An API may choose to allow leap seconds. Related

4931

# types are google.type.Date and `google.protobuf.Timestamp`.

4932

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

4933

# allow the value 60 if it allows leap-seconds.

4934

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

4935

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

4936

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

4937

# to allow the value "24:00:00" for scenarios like business closing time.

4938

},

4939

"dayOfWeekValue": "A String", # day of week

4940

"floatValue": 3.14, # float

4941

"stringValue": "A String", # string

4942

"timestampValue": "A String", # timestamp

4943

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

4944

# and time zone are either specified elsewhere or are not significant. The date

4945

# is relative to the Proleptic Gregorian Calendar. This can represent:

4946

#

4947

# * A full date, with non-zero year, month and day values

4948

# * A month and day value, with a zero year, e.g. an anniversary

4949

# * A year on its own, with zero month and day values

4950

# * A year and month value, with a zero day, e.g. a credit card expiration date

4951

#

4952

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

4953

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

4954

# month and day.

4955

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

4956

# a year.

4957

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

4958

# if specifying a year by itself or a year and month where the day is not

4959

# significant.

4960

},

4961

"booleanValue": True or False, # boolean

},

},

],

},

],

},

"numericalStatsResult": { # Result of the numerical stats computation. # Numerical stats result

4969

"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal

4970

# sized buckets.

4971

{ # Set of primitive values supported by the system.

4972

# Note that for the purposes of inspection or transformation, the number

4973

# of bytes considered to comprise a 'Value' is based on its representation

4974

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

4975

# 123456789, the number of bytes would be counted as 9, even though an

4976

# int64 only holds up to 8 bytes of data.

4977

"integerValue": "A String", # integer

4978

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

4979

# or are specified elsewhere. An API may choose to allow leap seconds. Related

4980

# types are google.type.Date and `google.protobuf.Timestamp`.

4981

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

4982

# allow the value 60 if it allows leap-seconds.

4983

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

4984

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

4985

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

4986

# to allow the value "24:00:00" for scenarios like business closing time.

4987

},

4988

"dayOfWeekValue": "A String", # day of week

4989

"floatValue": 3.14, # float

4990

"stringValue": "A String", # string

4991

"timestampValue": "A String", # timestamp

4992

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

4993

# and time zone are either specified elsewhere or are not significant. The date

4994

# is relative to the Proleptic Gregorian Calendar. This can represent:

4995

#

4996

# * A full date, with non-zero year, month and day values

4997

# * A month and day value, with a zero year, e.g. an anniversary

4998

# * A year on its own, with zero month and day values

4999

# * A year and month value, with a zero day, e.g. a credit card expiration date

5000

#

5001

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5002

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5003

# month and day.

5004

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5005

# a year.

5006

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5007

# if specifying a year by itself or a year and month where the day is not

5008

# significant.

5009

},

5010

"booleanValue": True or False, # boolean

5011

},

5012

],

5013

"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.

5014

# Note that for the purposes of inspection or transformation, the number

5015

# of bytes considered to comprise a 'Value' is based on its representation

5016

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5017

# 123456789, the number of bytes would be counted as 9, even though an

5018

# int64 only holds up to 8 bytes of data.

5019

"integerValue": "A String", # integer

5020

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

5021

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5022

# types are google.type.Date and `google.protobuf.Timestamp`.

5023

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5024

# allow the value 60 if it allows leap-seconds.

5025

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5026

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5027

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5028

# to allow the value "24:00:00" for scenarios like business closing time.

5029

},

5030

"dayOfWeekValue": "A String", # day of week

5031

"floatValue": 3.14, # float

5032

"stringValue": "A String", # string

5033

"timestampValue": "A String", # timestamp

5034

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

5035

# and time zone are either specified elsewhere or are not significant. The date

5036

# is relative to the Proleptic Gregorian Calendar. This can represent:

5037

#

5038

# * A full date, with non-zero year, month and day values

5039

# * A month and day value, with a zero year, e.g. an anniversary

5040

# * A year on its own, with zero month and day values

5041

# * A year and month value, with a zero day, e.g. a credit card expiration date

5042

#

5043

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5044

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5045

# month and day.

5046

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5047

# a year.

5048

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5049

# if specifying a year by itself or a year and month where the day is not

5050

# significant.

5051

},

5052

"booleanValue": True or False, # boolean

5053

},

5054

"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.

5055

# Note that for the purposes of inspection or transformation, the number

5056

# of bytes considered to comprise a 'Value' is based on its representation

5057

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5058

# 123456789, the number of bytes would be counted as 9, even though an

5059

# int64 only holds up to 8 bytes of data.

5060

"integerValue": "A String", # integer

5061

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

5062

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5063

# types are google.type.Date and `google.protobuf.Timestamp`.

5064

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5065

# allow the value 60 if it allows leap-seconds.

5066

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5067

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5068

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5069

# to allow the value "24:00:00" for scenarios like business closing time.

5070

},

5071

"dayOfWeekValue": "A String", # day of week

5072

"floatValue": 3.14, # float

5073

"stringValue": "A String", # string

5074

"timestampValue": "A String", # timestamp

5075

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

5076

# and time zone are either specified elsewhere or are not significant. The date

5077

# is relative to the Proleptic Gregorian Calendar. This can represent:

5078

#

5079

# * A full date, with non-zero year, month and day values

5080

# * A month and day value, with a zero year, e.g. an anniversary

5081

# * A year on its own, with zero month and day values

5082

# * A year and month value, with a zero day, e.g. a credit card expiration date

5083

#

5084

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5085

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5086

# month and day.

5087

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5088

# a year.

5089

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5090

# if specifying a year by itself or a year and month where the day is not

5091

# significant.

5092

},

5093

"booleanValue": True or False, # boolean

5094

},

5095

},

5096

"kAnonymityResult": { # Result of the k-anonymity computation. # K-anonymity result

5097

"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.

5098

{ # Histogram of k-anonymity equivalence classes.

5099

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

5100

# classes returned per bucket is capped at 20.

5101

{ # The set of columns' values that share the same ldiversity value

5102

"quasiIdsValues": [ # Set of values defining the equivalence class. One value per

5103

# quasi-identifier column in the original KAnonymity metric message.

5104

# The order is always the same as the original request.

5105

{ # Set of primitive values supported by the system.

5106

# Note that for the purposes of inspection or transformation, the number

5107

# of bytes considered to comprise a 'Value' is based on its representation

5108

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5109

# 123456789, the number of bytes would be counted as 9, even though an

5110

# int64 only holds up to 8 bytes of data.

5111

"integerValue": "A String", # integer

5112

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

5113

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5114

# types are google.type.Date and `google.protobuf.Timestamp`.

5115

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5116

# allow the value 60 if it allows leap-seconds.

5117

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5118

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5119

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5120

# to allow the value "24:00:00" for scenarios like business closing time.

5121

},

5122

"dayOfWeekValue": "A String", # day of week

5123

"floatValue": 3.14, # float

5124

"stringValue": "A String", # string

5125

"timestampValue": "A String", # timestamp

5126

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

5127

# and time zone are either specified elsewhere or are not significant. The date

5128

# is relative to the Proleptic Gregorian Calendar. This can represent:

5129

#

5130

# * A full date, with non-zero year, month and day values

5131

# * A month and day value, with a zero year, e.g. an anniversary

5132

# * A year on its own, with zero month and day values

5133

# * A year and month value, with a zero day, e.g. a credit card expiration date

5134

#

5135

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5136

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5137

# month and day.

5138

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5139

# a year.

5140

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5141

# if specifying a year by itself or a year and month where the day is not

5142

# significant.

5143

},

5144

"booleanValue": True or False, # boolean

5145

},

5146

],

5147

"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the

5148

# above set of values.

5149

},

5150

],

5151

"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.

5152

"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.

5153

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

5154

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

5155

},

5156

],

5157

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5158

"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5159

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including # Categorical stats

5160

# number of distinct values and value count distribution.

5161

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

5162

# supported except for arrays and structs. However, it may be more

5163

# informative to use NumericalStats when the field type is supported,

5164

# depending on the data.

5165

"name": "A String", # Name describing the field.

5166

},

5167

},

5168

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk. # l-diversity

5169

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

5170

"name": "A String", # Name describing the field.

5171

},

5172

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

5173

# defined for the l-diversity computation. When multiple fields are

5174

# specified, they are considered a single composite key.

5175

{ # General identifier of a data field in a storage service.

5176

"name": "A String", # Name describing the field.

},

],

},

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what # k-map

5181

# is called "journalist risk" in the literature, except the attack dataset is

5182

# statistically modeled instead of being perfectly known. This can be done

5183

# using publicly available data (like the US Census), or using a custom

5184

# statistical model (indicated as one or several BigQuery tables), or by

5185

# extrapolating from the distribution of values in the input dataset.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5186

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

5187

# Set if no column is tagged with a region-specific InfoType (like

5188

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5189

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two columns can have the

5190

# same tag.

5191

{ # A column with a semantic tag attached.

5192

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

5193

"name": "A String", # Name describing the field.

5194

},

5195

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

5196

# indicate an auxiliary table that contains statistical information on

5197

# the possible values of this column (below).

5198

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

5199

# dataset as a statistical model of population, if available. We

5200

# currently support US ZIP codes, region codes, ages and genders.

5201

# To programmatically obtain the list of supported InfoTypes, use

5202

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

5203

"name": "A String", # Name of the information type. Either a name of your choosing when

5204

# creating a CustomInfoType, or one of the names listed

5205

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5206

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

5207

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

5208

},

5209

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

5210

# the distribution of values in the input data

5211

# empty messages in your APIs. A typical example is to use it as the request

5212

# or the response type of an API method. For instance:

5213

#

5214

# service Foo {

5215

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

5216

# }

5217

#

5218

# The JSON representation for `Empty` is empty JSON object `{}`.

5219

},

5220

},

5221

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5222

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5223

# used to tag a quasi-identifiers column must appear in exactly one column

5224

# of one auxiliary table.

5225

{ # An auxiliary table contains statistical information on the relative

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5226

# frequency of different quasi-identifiers values. It has one or several

5227

# quasi-identifiers columns, and one column that indicates the relative

5228

# frequency of each quasi-identifier tuple.

5229

# If a tuple is present in the data but not in the auxiliary table, the

5230

# corresponding relative frequency is assumed to be zero (and thus, the

5231

# tuple is highly reidentifiable).

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5232

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

5233

# identified by its project_id, dataset_id, and table_name. Within a query

5234

# a table is often referenced with a string in the format of:

5235

# `<project_id>:<dataset_id>.<table_id>` or

5236

# `<project_id>.<dataset_id>.<table_id>`.

5237

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5238

# If omitted, project ID is inferred from the API call.

5239

"datasetId": "A String", # Dataset ID of the table.

5240

"tableId": "A String", # Name of the table.

5241

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5242

"quasiIds": [ # Required. Quasi-identifier columns.

5243

{ # A quasi-identifier column has a custom_tag, used to know which column

5244

# in the data corresponds to which column in the statistical model.

5245

"customTag": "A String", # A auxiliary field.

5246

"field": { # General identifier of a data field in a storage service. # Identifies the column.

5247

"name": "A String", # Name describing the field.

5248

},

5249

},

5250

],

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5251

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

5252

# between 0 and 1 (inclusive). Null values are assumed to be zero.

5253

"name": "A String", # Name describing the field.

5254

},

5255

},

5256

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5257

},

5258

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to # delta-presence

5259

# figure out that one given individual appears in a de-identified dataset.

5260

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

5261

# knowing the attack dataset, so we use a statistical model instead.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5262

"quasiIds": [ # Required. Fields considered to be quasi-identifiers. No two fields can have the

5263

# same tag.

5264

{ # A column with a semantic tag attached.

5265

"field": { # General identifier of a data field in a storage service. # Required. Identifies the column.

5266

"name": "A String", # Name describing the field.

5267

},

5268

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

5269

# dataset as a statistical model of population, if available. We

5270

# currently support US ZIP codes, region codes, ages and genders.

5271

# To programmatically obtain the list of supported InfoTypes, use

5272

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

5273

"name": "A String", # Name of the information type. Either a name of your choosing when

5274

# creating a CustomInfoType, or one of the names listed

5275

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5276

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

5277

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5278

},

5279

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

5280

# indicate an auxiliary table that contains statistical information on

5281

# the possible values of this column (below).

5282

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

5283

# the distribution of values in the input data

5284

# empty messages in your APIs. A typical example is to use it as the request

5285

# or the response type of an API method. For instance:

5286

#

5287

# service Foo {

5288

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

5289

# }

5290

#

5291

# The JSON representation for `Empty` is empty JSON object `{}`.

5292

},

5293

},

5294

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5295

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

5296

# used to tag a quasi-identifiers field must appear in exactly one

5297

# field of one auxiliary table.

5298

{ # An auxiliary table containing statistical information on the relative

5299

# frequency of different quasi-identifiers values. It has one or several

5300

# quasi-identifiers columns, and one column that indicates the relative

5301

# frequency of each quasi-identifier tuple.

5302

# If a tuple is present in the data but not in the auxiliary table, the

5303

# corresponding relative frequency is assumed to be zero (and thus, the

5304

# tuple is highly reidentifiable).

5305

"relativeFrequency": { # General identifier of a data field in a storage service. # Required. The relative frequency column must contain a floating-point number

5306

# between 0 and 1 (inclusive). Null values are assumed to be zero.

5307

"name": "A String", # Name describing the field.

5308

},

5309

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Required. Auxiliary table location.

5310

# identified by its project_id, dataset_id, and table_name. Within a query

5311

# a table is often referenced with a string in the format of:

5312

# `<project_id>:<dataset_id>.<table_id>` or

5313

# `<project_id>.<dataset_id>.<table_id>`.

5314

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5315

# If omitted, project ID is inferred from the API call.

5316

"datasetId": "A String", # Dataset ID of the table.

5317

"tableId": "A String", # Name of the table.

5318

},

5319

"quasiIds": [ # Required. Quasi-identifier columns.

5320

{ # A quasi-identifier column has a custom_tag, used to know which column

5321

# in the data corresponds to which column in the statistical model.

5322

"field": { # General identifier of a data field in a storage service. # Identifies the column.

5323

"name": "A String", # Name describing the field.

5324

},

5325

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

5326

# indicate an auxiliary table that contains statistical information on

5327

# the possible values of this column (below).

},

],

},

],

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

5333

# Set if no column is tagged with a region-specific InfoType (like

5334

# US_ZIP_5) or a region code.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5335

},

5336

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk. # K-anonymity

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5337

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Message indicating that multiple rows might be associated to a

5338

# single individual. If the same entity_id is associated to multiple

5339

# quasi-identifier tuples over distinct rows, we consider the entire

5340

# collection of tuples as the composite quasi-identifier. This collection

5341

# is a multiset: the order in which the different tuples appear in the

5342

# dataset is ignored, but their frequency is taken into account.

5343

#

5344

# Important note: a maximum of 1000 rows can be associated to a single

5345

# entity ID. If more rows are associated with the same entity ID, some

5346

# might be ignored.

5347

# single person. For example, in medical records the `EntityId` might be a

5348

# patient identifier, or for financial records it might be an account

5349

# identifier. This message is used when generalizations or analysis must take

5350

# into account that multiple rows correspond to the same entity.

5351

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

5352

"name": "A String", # Name describing the field.

5353

},

5354

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5355

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

5356

# specified, they are considered a single composite key. Structs and

5357

# repeated data types are not supported; however, nested fields are

5358

# supported so long as they are not structs themselves or nested within

5359

# a repeated field.

5360

{ # General identifier of a data field in a storage service.

5361

"name": "A String", # Name describing the field.

5362

},

5363

],

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5364

},

5365

"numericalStatsConfig": { # Compute numerical stats over an individual column, including # Numerical stats

5366

# min, max, and quantiles.

5367

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

5368

# integer, float, date, datetime, timestamp, time.

5369

"name": "A String", # Name describing the field.

5370

},

5371

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5372

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5373

"lDiversityResult": { # Result of the l-diversity computation. # L-divesity result

5374

"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.

5375

{ # Histogram of l-diversity equivalence class sensitive value frequencies.

5376

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5377

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

5378

# classes returned per bucket is capped at 20.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5379

{ # The set of columns' values that share the same ldiversity value.

5380

"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence

5381

# class. The order is always the same as the original request.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5382

{ # Set of primitive values supported by the system.

5383

# Note that for the purposes of inspection or transformation, the number

5384

# of bytes considered to comprise a 'Value' is based on its representation

5385

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5386

# 123456789, the number of bytes would be counted as 9, even though an

5387

# int64 only holds up to 8 bytes of data.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5388

"integerValue": "A String", # integer

5389

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

5390

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5391

# types are google.type.Date and `google.protobuf.Timestamp`.

5392

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5393

# allow the value 60 if it allows leap-seconds.

5394

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5395

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5396

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5397

# to allow the value "24:00:00" for scenarios like business closing time.

5398

},

5399

"dayOfWeekValue": "A String", # day of week

5400

"floatValue": 3.14, # float

5401

"stringValue": "A String", # string

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5402

"timestampValue": "A String", # timestamp

5403

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

5404

# and time zone are either specified elsewhere or are not significant. The date

5405

# is relative to the Proleptic Gregorian Calendar. This can represent:

5406

#

5407

# * A full date, with non-zero year, month and day values

5408

# * A month and day value, with a zero year, e.g. an anniversary

5409

# * A year on its own, with zero month and day values

5410

# * A year and month value, with a zero day, e.g. a credit card expiration date

5411

#

5412

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5413

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5414

# month and day.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5415

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5416

# a year.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5417

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5418

# if specifying a year by itself or a year and month where the day is not

5419

# significant.

5420

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5421

"booleanValue": True or False, # boolean

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5422

},

5423

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5424

"topSensitiveValues": [ # Estimated frequencies of top sensitive values.

5425

{ # A value of a field, including its frequency.

5426

"count": "A String", # How many times the value is contained in the field.

5427

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

5428

# Note that for the purposes of inspection or transformation, the number

5429

# of bytes considered to comprise a 'Value' is based on its representation

5430

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5431

# 123456789, the number of bytes would be counted as 9, even though an

5432

# int64 only holds up to 8 bytes of data.

5433

"integerValue": "A String", # integer

5434

"timeValue": { # Represents a time of day. The date and time zone are either not significant # time of day

5435

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5436

# types are google.type.Date and `google.protobuf.Timestamp`.

5437

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5438

# allow the value 60 if it allows leap-seconds.

5439

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5440

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5441

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5442

# to allow the value "24:00:00" for scenarios like business closing time.

5443

},

5444

"dayOfWeekValue": "A String", # day of week

5445

"floatValue": 3.14, # float

5446

"stringValue": "A String", # string

5447

"timestampValue": "A String", # timestamp

5448

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day # date

5449

# and time zone are either specified elsewhere or are not significant. The date

5450

# is relative to the Proleptic Gregorian Calendar. This can represent:

5451

#

5452

# * A full date, with non-zero year, month and day values

5453

# * A month and day value, with a zero year, e.g. an anniversary

5454

# * A year on its own, with zero month and day values

5455

# * A year and month value, with a zero day, e.g. a credit card expiration date

5456

#

5457

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5458

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5459

# month and day.

5460

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5461

# a year.

5462

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5463

# if specifying a year by itself or a year and month where the day is not

5464

# significant.

5465

},

5466

"booleanValue": True or False, # boolean

},

},

],

"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.

5471

"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

5472

},

5473

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5474

"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence

5475

# classes in this bucket.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5476

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5477

"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence

5478

# classes in this bucket.

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

},

],

},

"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

5483

# identified by its project_id, dataset_id, and table_name. Within a query

5484

# a table is often referenced with a string in the format of:

5485

# `<project_id>:<dataset_id>.<table_id>` or

5486

# `<project_id>.<dataset_id>.<table_id>`.

5487

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5488

# If omitted, project ID is inferred from the API call.

5489

"datasetId": "A String", # Dataset ID of the table.

5490

"tableId": "A String", # Name of the table.

5491

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5492

},

5493

"type": "A String", # The type of job.

5494

"endTime": "A String", # Time when the job finished.

5495

"startTime": "A String", # Time when the job started.

5496

"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that

5497

# instantiated the job.

5498

"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.

5499

"requestedOptions": { # Snapshot of the inspection configuration. # The configuration used for this job.

5500

"jobConfig": { # Controls what and how to inspect for findings. # Inspect config.

5501

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

5502

# `inspect_config` will be merged into the values persisted as part of the

5503

# template.

5504

"actions": [ # Actions to execute at the completion of the job.

5505

{ # A task to execute on the completion of a job.

5506

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

5507

"publishToStackdriver": { # Enable Stackdriver metric dlp.googleapis.com/finding_count. This # Enable Stackdriver metric dlp.googleapis.com/finding_count.

5508

# will publish a metric to stack driver on each infotype requested and

5509

# how many findings were found for it. CustomDetectors will be bucketed

5510

# as 'Custom' under the Stackdriver label 'info_type'.

5511

},

5512

"publishFindingsToCloudDataCatalog": { # Publish findings of a DlpJob to Cloud Data Catalog. Labels summarizing the # Publish findings to Cloud Datahub.

5513

# results of the DlpJob will be applied to the entry for the resource scanned

5514

# in Cloud Data Catalog. Any labels previously written by another DlpJob will

5515

# be deleted. InfoType naming patterns are strictly enforced when using this

5516

# feature. Note that the findings will be persisted in Cloud Data Catalog

5517

# storage and are governed by Data Catalog service-specific policy, see

5518

# https://cloud.google.com/terms/service-terms

5519

# Only a single instance of this action can be specified and only allowed if

5520

# all resources being scanned are BigQuery tables.

5521

# Compatible with: Inspect

5522

},

5523

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification for project owners and editors on job's

5524

# completion/failure.

5525

# completion/failure.

5526

},

5527

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

5528

# message contains a single field, `DlpJobName`, which is equal to the

5529

# finished job's

5530

# [`DlpJob.name`](https://cloud.google.com/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

5531

# Compatible with: Inspect, Risk

5532

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

5533

# publishing access rights to the DLP API service account executing

5534

# the long running DlpJob sending the notifications.

5535

# Format is projects/{project}/topics/{topic}.

5536

},

5537

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

5538

# OutputStorageConfig. Only a single instance of this action can be

5539

# specified.

5540

# Compatible with: Inspect, Risk

5541

"outputConfig": { # Cloud repository for storing output. # Location to store findings outside of DLP.

5542

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

5543

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

5544

# from the `Finding` object. If appending to an existing table, any columns

5545

# from the predefined schema that are missing will be added. No columns in

5546

# the existing table will be deleted.

5547

#

5548

# If unspecified, then all available columns will be used for a new table or

5549

# an (existing) table with no schema, and no changes will be made to an

5550

# existing table that has a schema.

5551

# Only for use with external storage.

5552

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

5553

# dataset. If table_id is not set a new one will be generated

5554

# for you with the following format:

5555

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

5556

# generating the date details.

5557

#

5558

# For Inspect, each column in an existing output table must have the same

5559

# name, type, and mode of a field in the `Finding` object.

5560

#

5561

# For Risk, an existing output table should be the output of a previous

5562

# Risk analysis job run on the same source table, with the same privacy

5563

# metric and quasi-identifiers. Risk jobs that analyze the same table but

5564

# compute a different privacy metric, or use different sets of

5565

# quasi-identifiers, cannot store their results in the same table.

5566

# identified by its project_id, dataset_id, and table_name. Within a query

5567

# a table is often referenced with a string in the format of:

5568

# `<project_id>:<dataset_id>.<table_id>` or

5569

# `<project_id>.<dataset_id>.<table_id>`.

5570

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5571

# If omitted, project ID is inferred from the API call.

5572

"datasetId": "A String", # Dataset ID of the table.

5573

"tableId": "A String", # Name of the table.

},

},

},

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

5578

# Command Center (CSCC Alpha).

5579

# This action is only available for projects which are parts of

5580

# an organization and whitelisted for the alpha Cloud Security Command

5581

# Center.

5582

# The action will publish count of finding instances and their info types.

5583

# The summary of findings will be persisted in CSCC and are governed by CSCC

5584

# service-specific policy, see https://cloud.google.com/terms/service-terms

5585

# Only a single instance of this action can be specified.

5586

# Compatible with: Inspect

5587

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5588

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5589

],

5590

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

5591

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options.

5592

# bucket.

5593

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

5594

# number of bytes scanned is rounded down. Must be between 0 and 100,

5595

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

5596

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

5597

"fileTypes": [ # List of file type groups to include in the scan.

5598

# If empty, all files are scanned and available data format processors

5599

# are applied. In addition, the binary content of the selected files

5600

# is always scanned as well.

5601

# Images are scanned only as binary if the specified region

5602

# does not support image inspection and no file_types were specified.

5603

# Image inspection is restricted to 'global', 'us', 'asia', and 'europe'.

5604

"A String",

5605

],

5606

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

5607

# than this value then the rest of the bytes are omitted. Only one

5608

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

5609

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

5610

# Number of files scanned is rounded down. Must be between 0 and 100,

5611

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

5612

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

5613

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

5614

# `regex_file_set` must be set.

5615

# expressions are used to allow fine-grained control over which files in the

5616

# bucket to include.

5617

#

5618

# Included files are those that match at least one item in `include_regex` and

5619

# do not match any items in `exclude_regex`. Note that a file that matches

5620

# items from both lists will _not_ be included. For a match to occur, the

5621

# entire file path (i.e., everything in the url after the bucket name) must

5622

# match the regular expression.

5623

#

5624

# For example, given the input `{bucket_name: "mybucket", include_regex:

5625

# ["directory1/.*"], exclude_regex:

5626

# ["directory1/excluded.*"]}`:

5627

#

5628

# * `gs://mybucket/directory1/myfile` will be included

5629

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

5630

# across `/`)

5631

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

5632

# full path doesn't match any items in `include_regex`)

5633

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

5634

# matches an item in `exclude_regex`)

5635

#

5636

# If `include_regex` is left empty, it will match all files by default

5637

# (this is equivalent to setting `include_regex: [".*"]`).

5638

#

5639

# Some other common use cases:

5640

#

5641

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

5642

# files in `mybucket` except for .pdf files

5643

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

5644

# include all files directly under `gs://mybucket/directory/`, without matching

5645

# across `/`

5646

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

5647

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

5648

# the bucket that match at least one of these regular expressions will be

5649

# excluded from the scan.

5650

#

5651

# Regular expressions use RE2

5652

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

5653

# under the google/re2 repository on GitHub.

5654

"A String",

5655

],

5656

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

5657

# the bucket that match at least one of these regular expressions will be

5658

# included in the set of files, except for those that also match an item in

5659

# `exclude_regex`. Leaving this field empty will match all files by default

5660

# (this is equivalent to including `.*` in the list).

5661

#

5662

# Regular expressions use RE2

5663

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

5664

# under the google/re2 repository on GitHub.

"A String",

],

},

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

5669

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

5670

#

5671

# If the url ends in a trailing slash, the bucket or directory represented

5672

# by the url will be scanned non-recursively (content in sub-directories

5673

# will not be scanned). This means that `gs://mybucket/` is equivalent to

5674

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

5675

# `gs://mybucket/directory/*`.

5676

#

5677

# Exactly one of `url` or `regex_file_set` must be set.

5678

},

5679

"sampleMethod": "A String",

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5680

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5681

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options.

5682

"sampleMethod": "A String",

5683

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

5684

# identified by its project_id, dataset_id, and table_name. Within a query

5685

# a table is often referenced with a string in the format of:

5686

# `<project_id>:<dataset_id>.<table_id>` or

5687

# `<project_id>.<dataset_id>.<table_id>`.

5688

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5689

# If omitted, project ID is inferred from the API call.

5690

"datasetId": "A String", # Dataset ID of the table.

5691

"tableId": "A String", # Name of the table.

5692

},

5693

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

5694

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

5695

# 100 means no limit. Defaults to 0. Only one of rows_limit and

5696

# rows_limit_percent can be specified. Cannot be used in conjunction with

5697

# TimespanConfig.

5698

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

5699

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

5700

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

5701

# Cannot be used in conjunction with TimespanConfig.

5702

"identifyingFields": [ # Table fields that may uniquely identify a row within the table. When

5703

# `actions.saveFindings.outputConfig.table` is specified, the values of

5704

# columns specified here are available in the output table under

5705

# `location.content_locations.record_location.record_key.id_values`. Nested

5706

# fields such as `person.birthdate.year` are allowed.

5707

{ # General identifier of a data field in a storage service.

5708

"name": "A String", # Name describing the field.

5709

},

5710

],

5711

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

5712

# inspection of entire columns which you know have no findings.

5713

{ # General identifier of a data field in a storage service.

5714

"name": "A String", # Name describing the field.

},

],

},

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

5719

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

5720

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

5721

# Used for data sources like Datastore and BigQuery.

5722

#

5723

# For BigQuery:

5724

# Required to filter out rows based on the given start and

5725

# end times. If not specified and the table was modified between the given

5726

# start and end times, the entire table will be scanned.

5727

# The valid data types of the timestamp field are: `INTEGER`, `DATE`,

5728

# `TIMESTAMP`, or `DATETIME` BigQuery column.

5729

#

5730

# For Datastore.

5731

# Valid data types of the timestamp field are: `TIMESTAMP`.

5732

# Datastore entity will be scanned if the timestamp property does not

5733

# exist or its value is empty or invalid.

5734

"name": "A String", # Name describing the field.

5735

},

5736

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

5737

# a valid start_time to avoid scanning files that have not been modified

5738

# since the last time the JobTrigger executed. This will be based on the

5739

# time of the execution of the last run of the JobTrigger.

5740

"startTime": "A String", # Exclude files or rows older than this value.

5741

"endTime": "A String", # Exclude files or rows newer than this value.

5742

# If set to zero, no upper time limit is applied.

5743

},

5744

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options.

5745

"kind": { # A representation of a Datastore kind. # The kind to process.

5746

"name": "A String", # The name of the kind.

5747

},

5748

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

5749

# by project and namespace, however the namespace ID may be empty.

5750

# A partition ID identifies a grouping of entities. The grouping is always

5751

# by project and namespace, however the namespace ID may be empty.

5752

#

5753

# A partition ID contains several dimensions:

5754

# project ID and namespace ID.

5755

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

5756

"projectId": "A String", # The ID of the project to which the entities belong.

5757

},

5758

},

5759

"hybridOptions": { # Configuration to control jobs where the content being inspected is outside # Hybrid inspection options.

5760

# Early access feature is in a pre-release state and might change or have

5761

# limited support. For more information, see

5762

# https://cloud.google.com/products#product-launch-stages.

5763

# of Google Cloud Platform.

5764

"tableOptions": { # Instructions regarding the table content being inspected. # If the container is a table, additional information to make findings

5765

# meaningful such as the columns that are primary keys.

5766

"identifyingFields": [ # The columns that are the primary keys for table objects included in

5767

# ContentItem. A copy of this cell's value will stored alongside alongside

5768

# each finding so that the finding can be traced to the specific row it came

5769

# from. No more than 3 may be provided.

5770

{ # General identifier of a data field in a storage service.

5771

"name": "A String", # Name describing the field.

},

],

},

"requiredFindingLabelKeys": [ # These are labels that each inspection request must include within their

5776

# 'finding_labels' map. Request may contain others, but any missing one of

5777

# these will be rejected.

5778

#

5779

# Label keys must be between 1 and 63 characters long and must conform

5780

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

5781

#

5782

# No more than 10 keys can be required.

5783

"A String",

5784

],

5785

"labels": { # To organize findings, these labels will be added to each finding.

5786

#

5787

# Label keys must be between 1 and 63 characters long and must conform

5788

# to the following regular expression: `[a-z]([-a-z0-9]*[a-z0-9])?`.

5789

#

5790

# Label values must be between 0 and 63 characters long and must conform

5791

# to the regular expression `([a-z]([-a-z0-9]*[a-z0-9])?)?`.

5792

#

5793

# No more than 10 labels can be associated with a given finding.

5794

#

5795

# Examples:

5796

# * `"environment" : "production"`

5797

# * `"pipeline" : "etl"`

5798

"a_key": "A String",

5799

},

5800

"description": "A String", # A short description of where the data is coming from. Will be stored once

5801

# in the job. 256 max length.

5802

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5803

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5804

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

5805

# When used with redactContent only info_types and min_likelihood are currently

5806

# used.

5807

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

5808

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

5809

{ # Custom information type provided by the user. Used to find domain-specific

5810

# sensitive information configurable to the data in question.

5811

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

5812

# be used to match sensitive information specific to the data, such as a list

5813

# of employee IDs or job titles.

5814

#

5815

# Dictionary words are case-insensitive and all characters other than letters

5816

# and digits in the unicode [Basic Multilingual

5817

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

5818

# will be replaced with whitespace when scanning for matches, so the

5819

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

5820

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

5821

# surrounding any match must be of a different type than the adjacent

5822

# characters within the word, so letters must be next to non-letters and

5823

# digits next to non-digits. For example, the dictionary word "jen" will

5824

# match the first three letters of the text "jen123" but will return no

5825

# matches for "jennifer".

5826

#

5827

# Dictionary words containing a large number of characters that are not

5828

# letters or digits may result in unexpected findings because such characters

5829

# are treated as whitespace. The

5830

# [limits](https://cloud.google.com/dlp/limits) page contains details about

5831

# the size limits of dictionaries. For dictionaries that do not fit within

5832

# these constraints, consider using `LargeCustomDictionaryConfig` in the

5833

# `StoredInfoType` API.

5834

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

5835

# is accepted.

5836

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

5837

# Example: gs://[BUCKET_NAME]/dictionary.txt

5838

},

5839

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

5840

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

5841

# at least one phrase and every phrase must contain at least 2 characters

5842

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

5848

# infoType, when the name matches one of existing infoTypes and that infoType

5849

# is specified in `InspectContent.info_types` field. Specifying the latter

5850

# adds findings to the one detected by the system. If built-in info type is

5851

# not specified in `InspectContent.info_types` list then the name is treated

5852

# as a custom info type.

5853

"name": "A String", # Name of the information type. Either a name of your choosing when

5854

# creating a CustomInfoType, or one of the names listed

5855

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5856

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

5857

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

5858

},

5859

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

5860

# altered by a detection rule if the finding meets the criteria specified by

5861

# the rule. Defaults to `VERY_LIKELY` if not specified.

5862

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

5863

# Rules are applied in order that they are specified. Not supported for the

5864

# `surrogate_type` CustomInfoType.

5865

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

5866

# `CustomInfoType` to alter behavior under certain circumstances, depending

5867

# on the specific details of the rule. Not supported for the `surrogate_type`

5868

# custom infoType.

5869

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

5870

# proximity of hotwords.

5871

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

5872

# The total length of the window cannot exceed 1000 characters. Note that

5873

# the finding itself will be included in the window, so that hotwords may

5874

# be used to match substrings of the finding itself. For example, the

5875

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

5876

# adjusted upwards if the area code is known to be the local area code of

5877

# a company office using the hotword regex "$xxx$", where "xxx"

5878

# is the area code in question.

5879

# rule.

5880

"windowAfter": 42, # Number of characters after the finding to consider.

5881

"windowBefore": 42, # Number of characters before the finding to consider.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5882

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5883

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

5884

# part of a detection rule.

5885

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

5886

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

5887

# levels. For example, if a finding would be `POSSIBLE` without the

5888

# detection rule and `relative_likelihood` is 1, then it is upgraded to

5889

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

5890

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

5891

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

5892

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

5893

# a final likelihood of `LIKELY`.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5894

},

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5895

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

5896

"groupIndexes": [ # The index of the submatch to extract as findings. When not

5897

# specified, the entire match is returned. No more than 3 may be included.

5898

42,

5899

],

5900

"pattern": "A String", # Pattern defining the regular expression. Its syntax

5901

# (https://github.com/google/re2/wiki/Syntax) can be found under the

5902

# google/re2 repository on GitHub.

5903

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5904

},

5905

},

5906

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5907

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

5908

# support reversing.

5909

# such as

5910

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

5911

# These types of transformations are

5912

# those that perform pseudonymization, thereby producing a "surrogate" as

5913

# output. This should be used in conjunction with a field on the

5914

# transformation such as `surrogate_info_type`. This CustomInfoType does

5915

# not support the use of `detection_rules`.

5916

},

5917

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

5918

"groupIndexes": [ # The index of the submatch to extract as findings. When not

5919

# specified, the entire match is returned. No more than 3 may be included.

5920

42,

5921

],

5922

"pattern": "A String", # Pattern defining the regular expression. Its syntax

5923

# (https://github.com/google/re2/wiki/Syntax) can be found under the

5924

# google/re2 repository on GitHub.

5925

},

5926

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

5927

# `InspectDataSource`. Not currently supported in `InspectContent`.

5928

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

5929

# `organizations/433245324/storedInfoTypes/432452342` or

5930

# `projects/project-id/storedInfoTypes/432452342`.

5931

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

5932

# inspection was created. Output-only field, populated by the system.

5933

},

5934

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

5935

# to be returned. It still can be used for rules matching.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

5936

},

5937

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

5938

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

5939

# POSSIBLE.

5940

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

5941

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

5942

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

5943

# When set within `InspectContentRequest`, the maximum returned is 2000

5944

# regardless if this is set higher.

5945

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

5946

{ # Max findings configuration per infoType, per content item or long

5947

# running DlpJob.

5948

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

5949

# info_type should be provided. If InfoTypeLimit does not have an

5950

# info_type, the DLP API applies the limit against all info_types that

5951

# are found but not specified in another InfoTypeLimit.

5952

"name": "A String", # Name of the information type. Either a name of your choosing when

5953

# creating a CustomInfoType, or one of the names listed

5954

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5955

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

5956

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

5957

},

5958

"maxFindings": 42, # Max findings limit for the given infoType.

5959

},

5960

],

5961

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

5962

# When set within `InspectJobConfig`,

5963

# the maximum returned is 2000 regardless if this is set higher.

5964

# When set within `InspectContentRequest`, this field is ignored.

5965

},

5966

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

5967

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

5968

# included in the response; see Finding.quote.

5969

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

5970

# Exclusion rules, contained in the set are executed in the end, other

5971

# rules are executed in the order they are specified for each info type.

5972

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

5973

# circumstances, depending on the specific details of the rules within the set.

5974

"infoTypes": [ # List of infoTypes this rule set is applied to.

5975

{ # Type of information detected by the API.

5976

"name": "A String", # Name of the information type. Either a name of your choosing when

5977

# creating a CustomInfoType, or one of the names listed

5978

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5979

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

5980

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

5981

},

5982

],

5983

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

5984

{ # A single inspection rule to be applied to infoTypes, specified in

5985

# `InspectionRuleSet`.

5986

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

5987

# proximity of hotwords.

5988

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

5989

# The total length of the window cannot exceed 1000 characters. Note that

5990

# the finding itself will be included in the window, so that hotwords may

5991

# be used to match substrings of the finding itself. For example, the

5992

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

5993

# adjusted upwards if the area code is known to be the local area code of

5994

# a company office using the hotword regex "$xxx$", where "xxx"

5995

# is the area code in question.

5996

# rule.

5997

"windowAfter": 42, # Number of characters after the finding to consider.

5998

"windowBefore": 42, # Number of characters before the finding to consider.

5999

},

6000

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

6001

# part of a detection rule.

6002

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

6003

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

6004

# levels. For example, if a finding would be `POSSIBLE` without the

6005

# detection rule and `relative_likelihood` is 1, then it is upgraded to

6006

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

6007

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

6008

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

6009

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

6010

# a final likelihood of `LIKELY`.

6011

},

6012

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

6013

"groupIndexes": [ # The index of the submatch to extract as findings. When not

6014

# specified, the entire match is returned. No more than 3 may be included.

6015

42,

6016

],

6017

"pattern": "A String", # Pattern defining the regular expression. Its syntax

6018

# (https://github.com/google/re2/wiki/Syntax) can be found under the

6019

# google/re2 repository on GitHub.

6020

},

6021

},

6022

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

6023

# `InspectionRuleSet` are removed from results.

6024

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

6025

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

6026

# be used to match sensitive information specific to the data, such as a list

6027

# of employee IDs or job titles.

6028

#

6029

# Dictionary words are case-insensitive and all characters other than letters

6030

# and digits in the unicode [Basic Multilingual

6031

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

6032

# will be replaced with whitespace when scanning for matches, so the

6033

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

6034

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

6035

# surrounding any match must be of a different type than the adjacent

6036

# characters within the word, so letters must be next to non-letters and

6037

# digits next to non-digits. For example, the dictionary word "jen" will

6038

# match the first three letters of the text "jen123" but will return no

6039

# matches for "jennifer".

6040

#

6041

# Dictionary words containing a large number of characters that are not

6042

# letters or digits may result in unexpected findings because such characters

6043

# are treated as whitespace. The

6044

# [limits](https://cloud.google.com/dlp/limits) page contains details about

6045

# the size limits of dictionaries. For dictionaries that do not fit within

6046

# these constraints, consider using `LargeCustomDictionaryConfig` in the

6047

# `StoredInfoType` API.

6048

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

6049

# is accepted.

6050

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

6051

# Example: gs://[BUCKET_NAME]/dictionary.txt

6052

},

6053

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

6054

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

6055

# at least one phrase and every phrase must contain at least 2 characters

6056

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

6062

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

6063

# contained within with a finding of an infoType from this list. For

6064

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

6065

# `exclusion_rule` containing `exclude_info_types.info_types` with

6066

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

6067

# with EMAIL_ADDRESS finding.

6068

# That leads to "555-222-2222@example.org" to generate only a single

6069

# finding, namely email address.

6070

{ # Type of information detected by the API.

6071

"name": "A String", # Name of the information type. Either a name of your choosing when

6072

# creating a CustomInfoType, or one of the names listed

6073

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6074

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6075

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

6080

"groupIndexes": [ # The index of the submatch to extract as findings. When not

6081

# specified, the entire match is returned. No more than 3 may be included.

6082

42,

6083

],

6084

"pattern": "A String", # Pattern defining the regular expression. Its syntax

6085

# (https://github.com/google/re2/wiki/Syntax) can be found under the

6086

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

6094

# If empty, text, images, and other content will be included.

6095

"A String",

6096

],

6097

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

6098

# InfoType values returned by ListInfoTypes or listed at

6099

# https://cloud.google.com/dlp/docs/infotypes-reference.

6100

#

6101

# When no InfoTypes or CustomInfoTypes are specified in a request, the

6102

# system may automatically choose what detectors to run. By default this may

6103

# be all types, but may change over time as detectors are updated.

6104

#

6105

# If you need precise control and predictability as to what detectors are

6106

# run you should specify specific InfoTypes listed in the reference,

6107

# otherwise a default list will be used, which may change over time.

6108

{ # Type of information detected by the API.

6109

"name": "A String", # Name of the information type. Either a name of your choosing when

6110

# creating a CustomInfoType, or one of the names listed

6111

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6112

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6113

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

},

"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of

6119

# this run.

6120

# to be detected) to be used anywhere you otherwise would normally specify

6121

# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates

6122

# to learn more.

6123

"description": "A String", # Short description (max 256 chars).

6124

"displayName": "A String", # Display name (max 256 chars).

6125

"createTime": "A String", # Output only. The creation timestamp of an inspectTemplate.

6126

"updateTime": "A String", # Output only. The last update timestamp of an inspectTemplate.

6127

"name": "A String", # Output only. The template name.

6128

#

6129

# The template will have one of the following formats:

6130

# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR

6131

# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`;

6132

"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.

6133

# When used with redactContent only info_types and min_likelihood are currently

6134

# used.

6135

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

6136

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

6137

{ # Custom information type provided by the user. Used to find domain-specific

6138

# sensitive information configurable to the data in question.

6139

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

6140

# be used to match sensitive information specific to the data, such as a list

6141

# of employee IDs or job titles.

6142

#

6143

# Dictionary words are case-insensitive and all characters other than letters

6144

# and digits in the unicode [Basic Multilingual

6145

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

6146

# will be replaced with whitespace when scanning for matches, so the

6147

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

6148

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

6149

# surrounding any match must be of a different type than the adjacent

6150

# characters within the word, so letters must be next to non-letters and

6151

# digits next to non-digits. For example, the dictionary word "jen" will

6152

# match the first three letters of the text "jen123" but will return no

6153

# matches for "jennifer".

6154

#

6155

# Dictionary words containing a large number of characters that are not

6156

# letters or digits may result in unexpected findings because such characters

6157

# are treated as whitespace. The

6158

# [limits](https://cloud.google.com/dlp/limits) page contains details about

6159

# the size limits of dictionaries. For dictionaries that do not fit within

6160

# these constraints, consider using `LargeCustomDictionaryConfig` in the

6161

# `StoredInfoType` API.

6162

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

6163

# is accepted.

6164

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

6165

# Example: gs://[BUCKET_NAME]/dictionary.txt

6166

},

6167

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

6168

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

6169

# at least one phrase and every phrase must contain at least 2 characters

6170

# that are letters or digits. [required]

"A String",

],

},

},

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

6176

# infoType, when the name matches one of existing infoTypes and that infoType

6177

# is specified in `InspectContent.info_types` field. Specifying the latter

6178

# adds findings to the one detected by the system. If built-in info type is

6179

# not specified in `InspectContent.info_types` list then the name is treated

6180

# as a custom info type.

6181

"name": "A String", # Name of the information type. Either a name of your choosing when

6182

# creating a CustomInfoType, or one of the names listed

6183

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6184

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6185

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

6186

},

6187

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

6188

# altered by a detection rule if the finding meets the criteria specified by

6189

# the rule. Defaults to `VERY_LIKELY` if not specified.

6190

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

6191

# Rules are applied in order that they are specified. Not supported for the

6192

# `surrogate_type` CustomInfoType.

6193

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

6194

# `CustomInfoType` to alter behavior under certain circumstances, depending

6195

# on the specific details of the rule. Not supported for the `surrogate_type`

6196

# custom infoType.

6197

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

6198

# proximity of hotwords.

6199

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

6200

# The total length of the window cannot exceed 1000 characters. Note that

6201

# the finding itself will be included in the window, so that hotwords may

6202

# be used to match substrings of the finding itself. For example, the

6203

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

6204

# adjusted upwards if the area code is known to be the local area code of

6205

# a company office using the hotword regex "$xxx$", where "xxx"

6206

# is the area code in question.

6207

# rule.

6208

"windowAfter": 42, # Number of characters after the finding to consider.

6209

"windowBefore": 42, # Number of characters before the finding to consider.

6210

},

6211

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

6212

# part of a detection rule.

6213

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

6214

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

6215

# levels. For example, if a finding would be `POSSIBLE` without the

6216

# detection rule and `relative_likelihood` is 1, then it is upgraded to

6217

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

6218

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

6219

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

6220

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

6221

# a final likelihood of `LIKELY`.

6222

},

6223

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

6224

"groupIndexes": [ # The index of the submatch to extract as findings. When not

6225

# specified, the entire match is returned. No more than 3 may be included.

6226

42,

6227

],

6228

"pattern": "A String", # Pattern defining the regular expression. Its syntax

6229

# (https://github.com/google/re2/wiki/Syntax) can be found under the

6230

# google/re2 repository on GitHub.

},

},

},

],

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

6236

# support reversing.

6237

# such as

6238

# [`CryptoReplaceFfxFpeConfig`](https://cloud.google.com/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

6239

# These types of transformations are

6240

# those that perform pseudonymization, thereby producing a "surrogate" as

6241

# output. This should be used in conjunction with a field on the

6242

# transformation such as `surrogate_info_type`. This CustomInfoType does

6243

# not support the use of `detection_rules`.

6244

},

6245

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

6246

"groupIndexes": [ # The index of the submatch to extract as findings. When not

6247

# specified, the entire match is returned. No more than 3 may be included.

6248

42,

6249

],

6250

"pattern": "A String", # Pattern defining the regular expression. Its syntax

6251

# (https://github.com/google/re2/wiki/Syntax) can be found under the

6252

# google/re2 repository on GitHub.

6253

},

6254

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

6255

# `InspectDataSource`. Not currently supported in `InspectContent`.

6256

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

6257

# `organizations/433245324/storedInfoTypes/432452342` or

6258

# `projects/project-id/storedInfoTypes/432452342`.

6259

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

6260

# inspection was created. Output-only field, populated by the system.

6261

},

6262

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

6263

# to be returned. It still can be used for rules matching.

6264

},

6265

],

6266

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

6267

# POSSIBLE.

6268

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

6269

"limits": { # Configuration to control the number of findings returned. # Configuration to control the number of findings returned.

6270

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

6271

# When set within `InspectContentRequest`, the maximum returned is 2000

6272

# regardless if this is set higher.

6273

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

6274

{ # Max findings configuration per infoType, per content item or long

6275

# running DlpJob.

6276

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

6277

# info_type should be provided. If InfoTypeLimit does not have an

6278

# info_type, the DLP API applies the limit against all info_types that

6279

# are found but not specified in another InfoTypeLimit.

6280

"name": "A String", # Name of the information type. Either a name of your choosing when

6281

# creating a CustomInfoType, or one of the names listed

6282

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6283

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6284

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

6285

},

6286

"maxFindings": 42, # Max findings limit for the given infoType.

6287

},

6288

],

6289

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

6290

# When set within `InspectJobConfig`,

6291

# the maximum returned is 2000 regardless if this is set higher.

6292

# When set within `InspectContentRequest`, this field is ignored.

6293

},

6294

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

6295

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

6296

# included in the response; see Finding.quote.

6297

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

6298

# Exclusion rules, contained in the set are executed in the end, other

6299

# rules are executed in the order they are specified for each info type.

6300

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

6301

# circumstances, depending on the specific details of the rules within the set.

6302

"infoTypes": [ # List of infoTypes this rule set is applied to.

6303

{ # Type of information detected by the API.

6304

"name": "A String", # Name of the information type. Either a name of your choosing when

6305

# creating a CustomInfoType, or one of the names listed

6306

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6307

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6308

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

6309

},

6310

],

6311

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

6312

{ # A single inspection rule to be applied to infoTypes, specified in

6313

# `InspectionRuleSet`.

6314

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

6315

# proximity of hotwords.

6316

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

6317

# The total length of the window cannot exceed 1000 characters. Note that

6318

# the finding itself will be included in the window, so that hotwords may

6319

# be used to match substrings of the finding itself. For example, the

6320

# certainty of a phone number regex "$\d{3}$ \d{3}-\d{4}" could be

6321

# adjusted upwards if the area code is known to be the local area code of

6322

# a company office using the hotword regex "$xxx$", where "xxx"

6323

# is the area code in question.

6324

# rule.

6325

"windowAfter": 42, # Number of characters after the finding to consider.

6326

"windowBefore": 42, # Number of characters before the finding to consider.

6327

},

6328

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

6329

# part of a detection rule.

6330

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

6331

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

6332

# levels. For example, if a finding would be `POSSIBLE` without the

6333

# detection rule and `relative_likelihood` is 1, then it is upgraded to

6334

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

6335

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

6336

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

6337

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

6338

# a final likelihood of `LIKELY`.

6339

},

6340

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

6341

"groupIndexes": [ # The index of the submatch to extract as findings. When not

6342

# specified, the entire match is returned. No more than 3 may be included.

6343

42,

6344

],

6345

"pattern": "A String", # Pattern defining the regular expression. Its syntax

6346

# (https://github.com/google/re2/wiki/Syntax) can be found under the

6347

# google/re2 repository on GitHub.

6348

},

6349

},

6350

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

6351

# `InspectionRuleSet` are removed from results.

6352

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

6353

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

6354

# be used to match sensitive information specific to the data, such as a list

6355

# of employee IDs or job titles.

6356

#

6357

# Dictionary words are case-insensitive and all characters other than letters

6358

# and digits in the unicode [Basic Multilingual

6359

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

6360

# will be replaced with whitespace when scanning for matches, so the

6361

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

6362

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

6363

# surrounding any match must be of a different type than the adjacent

6364

# characters within the word, so letters must be next to non-letters and

6365

# digits next to non-digits. For example, the dictionary word "jen" will

6366

# match the first three letters of the text "jen123" but will return no

6367

# matches for "jennifer".

6368

#

6369

# Dictionary words containing a large number of characters that are not

6370

# letters or digits may result in unexpected findings because such characters

6371

# are treated as whitespace. The

6372

# [limits](https://cloud.google.com/dlp/limits) page contains details about

6373

# the size limits of dictionaries. For dictionaries that do not fit within

6374

# these constraints, consider using `LargeCustomDictionaryConfig` in the

6375

# `StoredInfoType` API.

6376

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

6377

# is accepted.

6378

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

6379

# Example: gs://[BUCKET_NAME]/dictionary.txt

6380

},

6381

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

6382

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

6383

# at least one phrase and every phrase must contain at least 2 characters

6384

# that are letters or digits. [required]

"A String",

],

},

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

6390

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

6391

# contained within with a finding of an infoType from this list. For

6392

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

6393

# `exclusion_rule` containing `exclude_info_types.info_types` with

6394

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

6395

# with EMAIL_ADDRESS finding.

6396

# That leads to "555-222-2222@example.org" to generate only a single

6397

# finding, namely email address.

6398

{ # Type of information detected by the API.

6399

"name": "A String", # Name of the information type. Either a name of your choosing when

6400

# creating a CustomInfoType, or one of the names listed

6401

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6402

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6403

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

6408

"groupIndexes": [ # The index of the submatch to extract as findings. When not

6409

# specified, the entire match is returned. No more than 3 may be included.

6410

42,

6411

],

6412

"pattern": "A String", # Pattern defining the regular expression. Its syntax

6413

# (https://github.com/google/re2/wiki/Syntax) can be found under the

6414

# google/re2 repository on GitHub.

},

},

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

6422

# If empty, text, images, and other content will be included.

6423

"A String",

6424

],

6425

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

6426

# InfoType values returned by ListInfoTypes or listed at

6427

# https://cloud.google.com/dlp/docs/infotypes-reference.

6428

#

6429

# When no InfoTypes or CustomInfoTypes are specified in a request, the

6430

# system may automatically choose what detectors to run. By default this may

6431

# be all types, but may change over time as detectors are updated.

6432

#

6433

# If you need precise control and predictability as to what detectors are

6434

# run you should specify specific InfoTypes listed in the reference,

6435

# otherwise a default list will be used, which may change over time.

6436

{ # Type of information detected by the API.

6437

"name": "A String", # Name of the information type. Either a name of your choosing when

6438

# creating a CustomInfoType, or one of the names listed

6439

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6440

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6441

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

},

],

},

},

},

"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.

6448

"hybridStats": { # Statistics related to processing hybrid inspect requests. # Statistics related to the processing of hybrid inspect.

6449

# Early access feature is in a pre-release state and might change or have

6450

# limited support. For more information, see

6451

# https://cloud.google.com/products#product-launch-stages.

6452

"processedCount": "A String", # The number of hybrid inspection requests processed within this job.

6453

"abortedCount": "A String", # The number of hybrid inspection requests aborted because the job ran

6454

# out of quota or was ended before they could be processed.

6455

"pendingCount": "A String", # The number of hybrid requests currently being processed. Only populated

6456

# when called via method `getDlpJob`.

6457

# A burst of traffic may cause hybrid inspect requests to be enqueued.

6458

# Processing will take place as quickly as possible, but resource limitations

6459

# may impact how long a request is enqueued for.

6460

},

6461

"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.

6462

"infoTypeStats": [ # Statistics of how many instances of each info type were found during

6463

# inspect job.

6464

{ # Statistics regarding a specific InfoType.

6465

"infoType": { # Type of information detected by the API. # The type of finding this stat is for.

6466

"name": "A String", # Name of the information type. Either a name of your choosing when

6467

# creating a CustomInfoType, or one of the names listed

6468

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

6469

# a built-in type. When sending Cloud DLP results to Data Catalog, infoType

6470

# names should conform to the pattern `[A-Za-z0-9$-_]{1,64}`.

6471

},

6472

"count": "A String", # Number of findings for this infoType.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

6473

},

6474

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

6475

"processedBytes": "A String", # Total size in bytes that were processed.

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

6476

},

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

6477

},

Bu Sun Kim

2020-05-27 12:20:54 -0700

[diff] [blame]

6478

"name": "A String", # The server-assigned name.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

6479

},

6480

],

Bu Sun Kim

2020-07-22 17:02:09 -0700

[diff] [blame^]

6481

"nextPageToken": "A String", # The standard List next-page token.

Bu Sun Kim

2019-06-14 16:50:42 -0700

[diff] [blame]

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

6487

<pre>Retrieves the next page of results.

6488

6489

Args:

6490

previous_request: The request for the previous page. (required)

6491

previous_response: The response from the request for the previous page. (required)

6492

6493

Returns:

Bu Sun Kim

2020-05-20 12:08:20 -0700

[diff] [blame]

6494

A request object that you can call 'execute()' on to request the next

Bu Sun Kim