Blame - docs/dyn/dlp_v2.projects.dlpJobs.html - platform/external/python/google-api-python-client

<h1><a href="dlp_v2.html">Cloud Data Loss Prevention (DLP) API</a> . <a href="dlp_v2.projects.html">projects</a> . <a href="dlp_v2.projects.dlpJobs.html">dlpJobs</a></h1>

76

<h2>Instance Methods</h2>

77

78

<code><a href="#cancel">cancel(name, body=None, x__xgafv=None)</a></code></p>

79

<p class="firstline">Starts asynchronous cancellation on a long-running DlpJob. The server</p>

80

81

<code><a href="#create">create(parent, body, x__xgafv=None)</a></code></p>

82

<p class="firstline">Creates a new job to inspect storage or calculate risk metrics.</p>

83

84

<code><a href="#delete">delete(name, x__xgafv=None)</a></code></p>

85

<p class="firstline">Deletes a long-running DlpJob. This method indicates that the client is</p>

86

87

<code><a href="#get">get(name, x__xgafv=None)</a></code></p>

88

<p class="firstline">Gets the latest state of a long-running DlpJob.</p>

89

90

<code><a href="#list">list(parent, orderBy=None, type=None, pageSize=None, pageToken=None, x__xgafv=None, filter=None)</a></code></p>

91

<p class="firstline">Lists DlpJobs that match the specified filter in the request.</p>

92

93

<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>

94

<p class="firstline">Retrieves the next page of results.</p>

95

<h3>Method Details</h3>

96

97

<code class="details" id="cancel">cancel(name, body=None, x__xgafv=None)</code>

98

<pre>Starts asynchronous cancellation on a long-running DlpJob. The server

99

makes a best effort to cancel the DlpJob, but success is not

100

guaranteed.

101

See https://cloud.google.com/dlp/docs/inspecting-storage and

102

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

Args:

body: object, The request body.

107

The object takes the form of:

108

109

{ # The request message for canceling a DLP job.

110

}

111

112

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

119

120

{ # A generic empty message that you can re-use to avoid defining duplicated

121

# empty messages in your APIs. A typical example is to use it as the request

122

# or the response type of an API method. For instance:

123

#

124

# service Foo {

125

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

126

# }

127

#

128

# The JSON representation for `Empty` is empty JSON object `{}`.

}</pre>

</div>

<code class="details" id="create">create(parent, body, x__xgafv=None)</code>

134

<pre>Creates a new job to inspect storage or calculate risk metrics.

135

See https://cloud.google.com/dlp/docs/inspecting-storage and

136

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

137

138

When no InfoTypes or CustomInfoTypes are specified in inspect jobs, the

139

system will automatically choose what detectors to run. By default this may

140

be all types, but may change over time as detectors are updated.

141

142

Args:

143

parent: string, The parent resource name, for example projects/my-project-id. (required)

144

body: object, The request body. (required)

145

The object takes the form of:

146

147

{ # Request message for CreateDlpJobRequest. Used to initiate long running

148

# jobs such as calculating risk metrics or inspecting Google Cloud

149

# Storage.

150

"riskJob": { # Configuration for a risk analysis job. See

151

# https://cloud.google.com/dlp/docs/concepts-risk-analysis to learn more.

152

"privacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

153

"numericalStatsConfig": { # Compute numerical stats over an individual column, including

154

# min, max, and quantiles.

155

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

156

# integer, float, date, datetime, timestamp, time.

157

"name": "A String", # Name describing the field.

158

},

159

},

160

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what

161

# is called "journalist risk" in the literature, except the attack dataset is

162

# statistically modeled instead of being perfectly known. This can be done

163

# using publicly available data (like the US Census), or using a custom

164

# statistical model (indicated as one or several BigQuery tables), or by

165

# extrapolating from the distribution of values in the input dataset.

166

# A column with a semantic tag attached.

167

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

168

# Required if no column is tagged with a region-specific InfoType (like

169

# US_ZIP_5) or a region code.

170

"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the

171

# same tag. [required]

172

{

173

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

174

"name": "A String", # Name describing the field.

175

},

176

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

177

# indicate an auxiliary table that contains statistical information on

178

# the possible values of this column (below).

179

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

180

# dataset as a statistical model of population, if available. We

181

# currently support US ZIP codes, region codes, ages and genders.

182

# To programmatically obtain the list of supported InfoTypes, use

183

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

184

"name": "A String", # Name of the information type. Either a name of your choosing when

185

# creating a CustomInfoType, or one of the names listed

186

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

187

# a built-in type. InfoType names should conform to the pattern

188

# [a-zA-Z0-9_]{1,64}.

189

},

190

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

191

# the distribution of values in the input data

192

# empty messages in your APIs. A typical example is to use it as the request

193

# or the response type of an API method. For instance:

194

#

195

# service Foo {

196

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

197

# }

198

#

199

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

204

# used to tag a quasi-identifiers column must appear in exactly one column

205

# of one auxiliary table.

206

{ # An auxiliary table contains statistical information on the relative

207

# frequency of different quasi-identifiers values. It has one or several

208

# quasi-identifiers columns, and one column that indicates the relative

209

# frequency of each quasi-identifier tuple.

210

# If a tuple is present in the data but not in the auxiliary table, the

211

# corresponding relative frequency is assumed to be zero (and thus, the

212

# tuple is highly reidentifiable).

213

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

214

# between 0 and 1 (inclusive). Null values are assumed to be zero.

215

# [required]

216

"name": "A String", # Name describing the field.

217

},

218

"quasiIds": [ # Quasi-identifier columns. [required]

219

{ # A quasi-identifier column has a custom_tag, used to know which column

220

# in the data corresponds to which column in the statistical model.

221

"field": { # General identifier of a data field in a storage service.

222

"name": "A String", # Name describing the field.

223

},

224

"customTag": "A String",

225

},

226

],

227

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

228

# identified by its project_id, dataset_id, and table_name. Within a query

229

# a table is often referenced with a string in the format of:

230

# `<project_id>:<dataset_id>.<table_id>` or

231

# `<project_id>.<dataset_id>.<table_id>`.

232

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

233

# If omitted, project ID is inferred from the API call.

234

"tableId": "A String", # Name of the table.

235

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.

241

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

242

"name": "A String", # Name describing the field.

243

},

244

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

245

# defined for the l-diversity computation. When multiple fields are

246

# specified, they are considered a single composite key.

247

{ # General identifier of a data field in a storage service.

248

"name": "A String", # Name describing the field.

},

],

},

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to

253

# figure out that one given individual appears in a de-identified dataset.

254

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

255

# knowing the attack dataset, so we use a statistical model instead.

256

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

257

# Required if no column is tagged with a region-specific InfoType (like

258

# US_ZIP_5) or a region code.

259

"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the

260

# same tag. [required]

261

{ # A column with a semantic tag attached.

262

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

263

"name": "A String", # Name describing the field.

264

},

265

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

266

# indicate an auxiliary table that contains statistical information on

267

# the possible values of this column (below).

268

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

269

# dataset as a statistical model of population, if available. We

270

# currently support US ZIP codes, region codes, ages and genders.

271

# To programmatically obtain the list of supported InfoTypes, use

272

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

273

"name": "A String", # Name of the information type. Either a name of your choosing when

274

# creating a CustomInfoType, or one of the names listed

275

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

276

# a built-in type. InfoType names should conform to the pattern

277

# [a-zA-Z0-9_]{1,64}.

278

},

279

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

280

# the distribution of values in the input data

281

# empty messages in your APIs. A typical example is to use it as the request

282

# or the response type of an API method. For instance:

283

#

284

# service Foo {

285

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

286

# }

287

#

288

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

293

# used to tag a quasi-identifiers field must appear in exactly one

294

# field of one auxiliary table.

295

{ # An auxiliary table containing statistical information on the relative

296

# frequency of different quasi-identifiers values. It has one or several

297

# quasi-identifiers columns, and one column that indicates the relative

298

# frequency of each quasi-identifier tuple.

299

# If a tuple is present in the data but not in the auxiliary table, the

300

# corresponding relative frequency is assumed to be zero (and thus, the

301

# tuple is highly reidentifiable).

302

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

303

# between 0 and 1 (inclusive). Null values are assumed to be zero.

304

# [required]

305

"name": "A String", # Name describing the field.

306

},

307

"quasiIds": [ # Quasi-identifier columns. [required]

308

{ # A quasi-identifier column has a custom_tag, used to know which column

309

# in the data corresponds to which column in the statistical model.

310

"field": { # General identifier of a data field in a storage service.

311

"name": "A String", # Name describing the field.

312

},

313

"customTag": "A String",

314

},

315

],

316

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

317

# identified by its project_id, dataset_id, and table_name. Within a query

318

# a table is often referenced with a string in the format of:

319

# `<project_id>:<dataset_id>.<table_id>` or

320

# `<project_id>.<dataset_id>.<table_id>`.

321

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

322

# If omitted, project ID is inferred from the API call.

323

"tableId": "A String", # Name of the table.

324

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including

330

# number of distinct values and value count distribution.

331

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

332

# supported except for arrays and structs. However, it may be more

333

# informative to use NumericalStats when the field type is supported,

334

# depending on the data.

335

"name": "A String", # Name describing the field.

336

},

337

},

338

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.

339

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a

340

# single individual. If the same entity_id is associated to multiple

341

# quasi-identifier tuples over distinct rows, we consider the entire

342

# collection of tuples as the composite quasi-identifier. This collection

343

# is a multiset: the order in which the different tuples appear in the

344

# dataset is ignored, but their frequency is taken into account.

345

#

346

# Important note: a maximum of 1000 rows can be associated to a single

347

# entity ID. If more rows are associated with the same entity ID, some

348

# might be ignored.

349

# single person. For example, in medical records the `EntityId` might be a

350

# patient identifier, or for financial records it might be an account

351

# identifier. This message is used when generalizations or analysis must take

352

# into account that multiple rows correspond to the same entity.

353

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

354

"name": "A String", # Name describing the field.

355

},

356

},

357

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

358

# specified, they are considered a single composite key. Structs and

359

# repeated data types are not supported; however, nested fields are

360

# supported so long as they are not structs themselves or nested within

361

# a repeated field.

362

{ # General identifier of a data field in a storage service.

363

"name": "A String", # Name describing the field.

},

],

},

},

"sourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

369

# identified by its project_id, dataset_id, and table_name. Within a query

370

# a table is often referenced with a string in the format of:

371

# `<project_id>:<dataset_id>.<table_id>` or

372

# `<project_id>.<dataset_id>.<table_id>`.

373

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

374

# If omitted, project ID is inferred from the API call.

375

"tableId": "A String", # Name of the table.

376

"datasetId": "A String", # Dataset ID of the table.

377

},

378

"actions": [ # Actions to execute at the completion of the job. Are executed in the order

379

# provided.

380

{ # A task to execute on the completion of a job.

381

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

382

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

383

# OutputStorageConfig. Only a single instance of this action can be

384

# specified.

385

# Compatible with: Inspect, Risk

386

"outputConfig": { # Cloud repository for storing output.

387

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

388

# dataset. If table_id is not set a new one will be generated

389

# for you with the following format:

390

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

391

# generating the date details.

392

#

393

# For Inspect, each column in an existing output table must have the same

394

# name, type, and mode of a field in the `Finding` object.

395

#

396

# For Risk, an existing output table should be the output of a previous

397

# Risk analysis job run on the same source table, with the same privacy

398

# metric and quasi-identifiers. Risk jobs that analyze the same table but

399

# compute a different privacy metric, or use different sets of

400

# quasi-identifiers, cannot store their results in the same table.

401

# identified by its project_id, dataset_id, and table_name. Within a query

402

# a table is often referenced with a string in the format of:

403

# `<project_id>:<dataset_id>.<table_id>` or

404

# `<project_id>.<dataset_id>.<table_id>`.

405

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

406

# If omitted, project ID is inferred from the API call.

407

"tableId": "A String", # Name of the table.

408

"datasetId": "A String", # Dataset ID of the table.

409

},

410

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

411

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

412

# from the `Finding` object. If appending to an existing table, any columns

413

# from the predefined schema that are missing will be added. No columns in

414

# the existing table will be deleted.

415

#

416

# If unspecified, then all available columns will be used for a new table or

417

# an (existing) table with no schema, and no changes will be made to an

418

# existing table that has a schema.

419

},

420

},

421

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's

422

# completion/failure.

423

# completion/failure.

424

},

425

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

426

# Command Center (CSCC Alpha).

427

# This action is only available for projects which are parts of

428

# an organization and whitelisted for the alpha Cloud Security Command

429

# Center.

430

# The action will publish count of finding instances and their info types.

431

# The summary of findings will be persisted in CSCC and are governed by CSCC

432

# service-specific policy, see https://cloud.google.com/terms/service-terms

433

# Only a single instance of this action can be specified.

434

# Compatible with: Inspect

435

},

436

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

437

# message contains a single field, `DlpJobName`, which is equal to the

438

# finished job's

439

# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

440

# Compatible with: Inspect, Risk

441

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

442

# publishing access rights to the DLP API service account executing

443

# the long running DlpJob sending the notifications.

444

# Format is projects/{project}/topics/{topic}.

},

},

],

},

"jobId": "A String", # The job id can contain uppercase and lowercase letters,

450

# numbers, and hyphens; that is, it must match the regular

451

# expression: `[a-zA-Z\\d-_]+`. The maximum length is 100

452

# characters. Can be empty to allow the system to generate one.

453

"inspectJob": {

454

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

455

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.

456

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

457

# by project and namespace, however the namespace ID may be empty.

458

# A partition ID identifies a grouping of entities. The grouping is always

459

# by project and namespace, however the namespace ID may be empty.

460

#

461

# A partition ID contains several dimensions:

462

# project ID and namespace ID.

463

"projectId": "A String", # The ID of the project to which the entities belong.

464

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

465

},

466

"kind": { # A representation of a Datastore kind. # The kind to process.

467

"name": "A String", # The name of the kind.

468

},

469

},

470

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.

471

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

472

# inspection of entire columns which you know have no findings.

473

{ # General identifier of a data field in a storage service.

474

"name": "A String", # Name describing the field.

475

},

476

],

477

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

478

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

479

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

480

# Cannot be used in conjunction with TimespanConfig.

481

"sampleMethod": "A String",

482

"identifyingFields": [ # References to fields uniquely identifying rows within the table.

483

# Nested fields in the format, like `person.birthdate.year`, are allowed.

484

{ # General identifier of a data field in a storage service.

485

"name": "A String", # Name describing the field.

486

},

487

],

488

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

489

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

490

# 100 means no limit. Defaults to 0. Only one of rows_limit and

491

# rows_limit_percent can be specified. Cannot be used in conjunction with

492

# TimespanConfig.

493

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

494

# identified by its project_id, dataset_id, and table_name. Within a query

495

# a table is often referenced with a string in the format of:

496

# `<project_id>:<dataset_id>.<table_id>` or

497

# `<project_id>.<dataset_id>.<table_id>`.

498

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

499

# If omitted, project ID is inferred from the API call.

500

"tableId": "A String", # Name of the table.

501

"datasetId": "A String", # Dataset ID of the table.

502

},

503

},

504

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

505

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

506

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

507

# Used for data sources like Datastore or BigQuery.

508

# If not specified for BigQuery, table last modification timestamp

509

# is checked against given time span.

510

# The valid data types of the timestamp field are:

511

# for BigQuery - timestamp, date, datetime;

512

# for Datastore - timestamp.

513

# Datastore entity will be scanned if the timestamp property does not exist

514

# or its value is empty or invalid.

515

"name": "A String", # Name describing the field.

516

},

517

"endTime": "A String", # Exclude files or rows newer than this value.

518

# If set to zero, no upper time limit is applied.

519

"startTime": "A String", # Exclude files or rows older than this value.

520

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

521

# a valid start_time to avoid scanning files that have not been modified

522

# since the last time the JobTrigger executed. This will be based on the

523

# time of the execution of the last run of the JobTrigger.

524

},

525

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.

526

# bucket.

527

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

528

# than this value then the rest of the bytes are omitted. Only one

529

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

530

"sampleMethod": "A String",

531

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

532

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

533

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

534

#

535

# If the url ends in a trailing slash, the bucket or directory represented

536

# by the url will be scanned non-recursively (content in sub-directories

537

# will not be scanned). This means that `gs://mybucket/` is equivalent to

538

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

539

# `gs://mybucket/directory/*`.

540

#

541

# Exactly one of `url` or `regex_file_set` must be set.

542

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

543

# `regex_file_set` must be set.

544

# expressions are used to allow fine-grained control over which files in the

545

# bucket to include.

546

#

547

# Included files are those that match at least one item in `include_regex` and

548

# do not match any items in `exclude_regex`. Note that a file that matches

549

# items from both lists will _not_ be included. For a match to occur, the

550

# entire file path (i.e., everything in the url after the bucket name) must

551

# match the regular expression.

552

#

553

# For example, given the input `{bucket_name: "mybucket", include_regex:

554

# ["directory1/.*"], exclude_regex:

555

# ["directory1/excluded.*"]}`:

556

#

557

# * `gs://mybucket/directory1/myfile` will be included

558

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

559

# across `/`)

560

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

561

# full path doesn't match any items in `include_regex`)

562

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

563

# matches an item in `exclude_regex`)

564

#

565

# If `include_regex` is left empty, it will match all files by default

566

# (this is equivalent to setting `include_regex: [".*"]`).

567

#

568

# Some other common use cases:

569

#

570

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

571

# files in `mybucket` except for .pdf files

572

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

573

# include all files directly under `gs://mybucket/directory/`, without matching

574

# across `/`

575

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

576

# the bucket that match at least one of these regular expressions will be

577

# excluded from the scan.

578

#

579

# Regular expressions use RE2

580

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

581

# under the google/re2 repository on GitHub.

582

"A String",

583

],

584

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

585

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

586

# the bucket that match at least one of these regular expressions will be

587

# included in the set of files, except for those that also match an item in

588

# `exclude_regex`. Leaving this field empty will match all files by default

589

# (this is equivalent to including `.*` in the list).

590

#

591

# Regular expressions use RE2

592

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

593

# under the google/re2 repository on GitHub.

"A String",

],

},

},

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

599

# number of bytes scanned is rounded down. Must be between 0 and 100,

600

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

601

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

602

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

603

# Number of files scanned is rounded down. Must be between 0 and 100,

604

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

605

"fileTypes": [ # List of file type groups to include in the scan.

606

# If empty, all files are scanned and available data format processors

607

# are applied. In addition, the binary content of the selected files

608

# is always scanned as well.

"A String",

],

},

},

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

614

# When used with redactContent only info_types and min_likelihood are currently

615

# used.

616

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

617

"limits": {

618

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

619

# When set within `InspectContentRequest`, the maximum returned is 2000

620

# regardless if this is set higher.

621

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

622

{ # Max findings configuration per infoType, per content item or long

623

# running DlpJob.

624

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

625

# info_type should be provided. If InfoTypeLimit does not have an

626

# info_type, the DLP API applies the limit against all info_types that

627

# are found but not specified in another InfoTypeLimit.

628

"name": "A String", # Name of the information type. Either a name of your choosing when

629

# creating a CustomInfoType, or one of the names listed

630

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

631

# a built-in type. InfoType names should conform to the pattern

632

# [a-zA-Z0-9_]{1,64}.

633

},

634

"maxFindings": 42, # Max findings limit for the given infoType.

635

},

636

],

637

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

638

# When set within `InspectDataSourceRequest`,

639

# the maximum returned is 2000 regardless if this is set higher.

640

# When set within `InspectContentRequest`, this field is ignored.

641

},

642

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

643

# POSSIBLE.

644

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

645

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

646

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

647

{ # Custom information type provided by the user. Used to find domain-specific

648

# sensitive information configurable to the data in question.

649

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

650

"pattern": "A String", # Pattern defining the regular expression. Its syntax

651

# (https://github.com/google/re2/wiki/Syntax) can be found under the

652

# google/re2 repository on GitHub.

653

"groupIndexes": [ # The index of the submatch to extract as findings. When not

654

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

659

# support reversing.

660

# such as

661

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

662

# These types of transformations are

663

# those that perform pseudonymization, thereby producing a "surrogate" as

664

# output. This should be used in conjunction with a field on the

665

# transformation such as `surrogate_info_type`. This CustomInfoType does

666

# not support the use of `detection_rules`.

667

},

668

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

669

# infoType, when the name matches one of existing infoTypes and that infoType

670

# is specified in `InspectContent.info_types` field. Specifying the latter

671

# adds findings to the one detected by the system. If built-in info type is

672

# not specified in `InspectContent.info_types` list then the name is treated

673

# as a custom info type.

674

"name": "A String", # Name of the information type. Either a name of your choosing when

675

# creating a CustomInfoType, or one of the names listed

676

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

677

# a built-in type. InfoType names should conform to the pattern

678

# [a-zA-Z0-9_]{1,64}.

679

},

680

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

681

# be used to match sensitive information specific to the data, such as a list

682

# of employee IDs or job titles.

683

#

684

# Dictionary words are case-insensitive and all characters other than letters

685

# and digits in the unicode [Basic Multilingual

686

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

687

# will be replaced with whitespace when scanning for matches, so the

688

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

689

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

690

# surrounding any match must be of a different type than the adjacent

691

# characters within the word, so letters must be next to non-letters and

692

# digits next to non-digits. For example, the dictionary word "jen" will

693

# match the first three letters of the text "jen123" but will return no

694

# matches for "jennifer".

695

#

696

# Dictionary words containing a large number of characters that are not

697

# letters or digits may result in unexpected findings because such characters

698

# are treated as whitespace. The

699

# [limits](https://cloud.google.com/dlp/limits) page contains details about

700

# the size limits of dictionaries. For dictionaries that do not fit within

701

# these constraints, consider using `LargeCustomDictionaryConfig` in the

702

# `StoredInfoType` API.

703

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

704

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

705

# at least one phrase and every phrase must contain at least 2 characters

706

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

711

# is accepted.

712

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

713

# Example: gs://[BUCKET_NAME]/dictionary.txt

714

},

715

},

716

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

717

# `InspectDataSource`. Not currently supported in `InspectContent`.

718

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

719

# `organizations/433245324/storedInfoTypes/432452342` or

720

# `projects/project-id/storedInfoTypes/432452342`.

721

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

722

# inspection was created. Output-only field, populated by the system.

723

},

724

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

725

# Rules are applied in order that they are specified. Not supported for the

726

# `surrogate_type` CustomInfoType.

727

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

728

# `CustomInfoType` to alter behavior under certain circumstances, depending

729

# on the specific details of the rule. Not supported for the `surrogate_type`

730

# custom infoType.

731

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

732

# proximity of hotwords.

733

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

734

# The total length of the window cannot exceed 1000 characters. Note that

735

# the finding itself will be included in the window, so that hotwords may

736

# be used to match substrings of the finding itself. For example, the

737

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

738

# adjusted upwards if the area code is known to be the local area code of

739

# a company office using the hotword regex "\(xxx\)", where "xxx"

740

# is the area code in question.

741

# rule.

742

"windowAfter": 42, # Number of characters after the finding to consider.

743

"windowBefore": 42, # Number of characters before the finding to consider.

744

},

745

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

746

"pattern": "A String", # Pattern defining the regular expression. Its syntax

747

# (https://github.com/google/re2/wiki/Syntax) can be found under the

748

# google/re2 repository on GitHub.

749

"groupIndexes": [ # The index of the submatch to extract as findings. When not

750

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

755

# part of a detection rule.

756

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

757

# levels. For example, if a finding would be `POSSIBLE` without the

758

# detection rule and `relative_likelihood` is 1, then it is upgraded to

759

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

760

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

761

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

762

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

763

# a final likelihood of `LIKELY`.

764

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

770

# to be returned. It still can be used for rules matching.

771

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

772

# altered by a detection rule if the finding meets the criteria specified by

773

# the rule. Defaults to `VERY_LIKELY` if not specified.

774

},

775

],

776

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

777

# included in the response; see Finding.quote.

778

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

779

# Exclusion rules, contained in the set are executed in the end, other

780

# rules are executed in the order they are specified for each info type.

781

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

782

# circumstances, depending on the specific details of the rules within the set.

783

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

784

{ # A single inspection rule to be applied to infoTypes, specified in

785

# `InspectionRuleSet`.

786

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

787

# proximity of hotwords.

788

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

789

# The total length of the window cannot exceed 1000 characters. Note that

790

# the finding itself will be included in the window, so that hotwords may

791

# be used to match substrings of the finding itself. For example, the

792

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

793

# adjusted upwards if the area code is known to be the local area code of

794

# a company office using the hotword regex "\(xxx\)", where "xxx"

795

# is the area code in question.

796

# rule.

797

"windowAfter": 42, # Number of characters after the finding to consider.

798

"windowBefore": 42, # Number of characters before the finding to consider.

799

},

800

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

801

"pattern": "A String", # Pattern defining the regular expression. Its syntax

802

# (https://github.com/google/re2/wiki/Syntax) can be found under the

803

# google/re2 repository on GitHub.

804

"groupIndexes": [ # The index of the submatch to extract as findings. When not

805

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

810

# part of a detection rule.

811

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

812

# levels. For example, if a finding would be `POSSIBLE` without the

813

# detection rule and `relative_likelihood` is 1, then it is upgraded to

814

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

815

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

816

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

817

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

818

# a final likelihood of `LIKELY`.

819

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

820

},

821

},

822

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

823

# `InspectionRuleSet` are removed from results.

824

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

825

"pattern": "A String", # Pattern defining the regular expression. Its syntax

826

# (https://github.com/google/re2/wiki/Syntax) can be found under the

827

# google/re2 repository on GitHub.

828

"groupIndexes": [ # The index of the submatch to extract as findings. When not

829

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

834

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

835

# contained within with a finding of an infoType from this list. For

836

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

837

# `exclusion_rule` containing `exclude_info_types.info_types` with

838

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

839

# with EMAIL_ADDRESS finding.

840

# That leads to "555-222-2222@example.org" to generate only a single

841

# finding, namely email address.

842

{ # Type of information detected by the API.

843

"name": "A String", # Name of the information type. Either a name of your choosing when

844

# creating a CustomInfoType, or one of the names listed

845

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

846

# a built-in type. InfoType names should conform to the pattern

847

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

852

# be used to match sensitive information specific to the data, such as a list

853

# of employee IDs or job titles.

854

#

855

# Dictionary words are case-insensitive and all characters other than letters

856

# and digits in the unicode [Basic Multilingual

857

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

858

# will be replaced with whitespace when scanning for matches, so the

859

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

860

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

861

# surrounding any match must be of a different type than the adjacent

862

# characters within the word, so letters must be next to non-letters and

863

# digits next to non-digits. For example, the dictionary word "jen" will

864

# match the first three letters of the text "jen123" but will return no

865

# matches for "jennifer".

866

#

867

# Dictionary words containing a large number of characters that are not

868

# letters or digits may result in unexpected findings because such characters

869

# are treated as whitespace. The

870

# [limits](https://cloud.google.com/dlp/limits) page contains details about

871

# the size limits of dictionaries. For dictionaries that do not fit within

872

# these constraints, consider using `LargeCustomDictionaryConfig` in the

873

# `StoredInfoType` API.

874

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

875

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

876

# at least one phrase and every phrase must contain at least 2 characters

877

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

882

# is accepted.

883

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

884

# Example: gs://[BUCKET_NAME]/dictionary.txt

885

},

886

},

887

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

892

{ # Type of information detected by the API.

893

"name": "A String", # Name of the information type. Either a name of your choosing when

894

# creating a CustomInfoType, or one of the names listed

895

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

896

# a built-in type. InfoType names should conform to the pattern

897

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

903

# If empty, text, images, and other content will be included.

904

"A String",

905

],

906

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

907

# InfoType values returned by ListInfoTypes or listed at

908

# https://cloud.google.com/dlp/docs/infotypes-reference.

909

#

910

# When no InfoTypes or CustomInfoTypes are specified in a request, the

911

# system may automatically choose what detectors to run. By default this may

912

# be all types, but may change over time as detectors are updated.

913

#

914

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

915

# but may change over time as new InfoTypes are added. If you need precise

916

# control and predictability as to what detectors are run you should specify

917

# specific InfoTypes listed in the reference.

918

{ # Type of information detected by the API.

919

"name": "A String", # Name of the information type. Either a name of your choosing when

920

# creating a CustomInfoType, or one of the names listed

921

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

922

# a built-in type. InfoType names should conform to the pattern

923

# [a-zA-Z0-9_]{1,64}.

},

],

},

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

928

# `inspect_config` will be merged into the values persisted as part of the

929

# template.

930

"actions": [ # Actions to execute at the completion of the job.

931

{ # A task to execute on the completion of a job.

932

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

933

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

934

# OutputStorageConfig. Only a single instance of this action can be

935

# specified.

936

# Compatible with: Inspect, Risk

937

"outputConfig": { # Cloud repository for storing output.

938

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

939

# dataset. If table_id is not set a new one will be generated

940

# for you with the following format:

941

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

942

# generating the date details.

943

#

944

# For Inspect, each column in an existing output table must have the same

945

# name, type, and mode of a field in the `Finding` object.

946

#

947

# For Risk, an existing output table should be the output of a previous

948

# Risk analysis job run on the same source table, with the same privacy

949

# metric and quasi-identifiers. Risk jobs that analyze the same table but

950

# compute a different privacy metric, or use different sets of

951

# quasi-identifiers, cannot store their results in the same table.

952

# identified by its project_id, dataset_id, and table_name. Within a query

953

# a table is often referenced with a string in the format of:

954

# `<project_id>:<dataset_id>.<table_id>` or

955

# `<project_id>.<dataset_id>.<table_id>`.

956

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

957

# If omitted, project ID is inferred from the API call.

958

"tableId": "A String", # Name of the table.

959

"datasetId": "A String", # Dataset ID of the table.

960

},

961

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

962

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

963

# from the `Finding` object. If appending to an existing table, any columns

964

# from the predefined schema that are missing will be added. No columns in

965

# the existing table will be deleted.

966

#

967

# If unspecified, then all available columns will be used for a new table or

968

# an (existing) table with no schema, and no changes will be made to an

969

# existing table that has a schema.

970

},

971

},

972

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's

973

# completion/failure.

974

# completion/failure.

975

},

976

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

977

# Command Center (CSCC Alpha).

978

# This action is only available for projects which are parts of

979

# an organization and whitelisted for the alpha Cloud Security Command

980

# Center.

981

# The action will publish count of finding instances and their info types.

982

# The summary of findings will be persisted in CSCC and are governed by CSCC

983

# service-specific policy, see https://cloud.google.com/terms/service-terms

984

# Only a single instance of this action can be specified.

985

# Compatible with: Inspect

986

},

987

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

988

# message contains a single field, `DlpJobName`, which is equal to the

989

# finished job's

990

# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

991

# Compatible with: Inspect, Risk

992

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

993

# publishing access rights to the DLP API service account executing

994

# the long running DlpJob sending the notifications.

995

# Format is projects/{project}/topics/{topic}.

},

},

],

},

}

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

1009

1010

{ # Combines all of the information about a DLP job.

1011

"errors": [ # A stream of errors encountered running the job.

1012

{ # Details information about an error encountered during job execution or

1013

# the results of an unsuccessful activation of the JobTrigger.

1014

# Output only field.

1015

"timestamps": [ # The times the error occurred.

1016

"A String",

1017

],

1018

"details": { # The `Status` type defines a logical error model that is suitable for

1019

# different programming environments, including REST APIs and RPC APIs. It is

1020

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

1021

# three pieces of data: error code, error message, and error details.

1022

#

1023

# You can find out more about this error model and how to work with it in the

1024

# [API Design Guide](https://cloud.google.com/apis/design/errors).

1025

"message": "A String", # A developer-facing error message, which should be in English. Any

1026

# user-facing error message should be localized and sent in the

1027

# google.rpc.Status.details field, or localized by the client.

1028

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

1029

"details": [ # A list of messages that carry the error details. There is a common set of

1030

# message types for APIs to use.

1031

{

1032

"a_key": "", # Properties of the object. Contains field @type with type URL.

},

],

},

},

],

"name": "A String", # The server-assigned name.

1039

"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.

1040

"requestedOptions": { # The configuration used for this job.

1041

"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of

1042

# this run.

1043

# to be detected) to be used anywhere you otherwise would normally specify

1044

# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates

1045

# to learn more.

1046

"updateTime": "A String", # The last update timestamp of a inspectTemplate, output only field.

1047

"displayName": "A String", # Display name (max 256 chars).

1048

"description": "A String", # Short description (max 256 chars).

1049

"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.

1050

# When used with redactContent only info_types and min_likelihood are currently

1051

# used.

1052

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

1053

"limits": {

1054

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

1055

# When set within `InspectContentRequest`, the maximum returned is 2000

1056

# regardless if this is set higher.

1057

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

1058

{ # Max findings configuration per infoType, per content item or long

1059

# running DlpJob.

1060

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

1061

# info_type should be provided. If InfoTypeLimit does not have an

1062

# info_type, the DLP API applies the limit against all info_types that

1063

# are found but not specified in another InfoTypeLimit.

1064

"name": "A String", # Name of the information type. Either a name of your choosing when

1065

# creating a CustomInfoType, or one of the names listed

1066

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1067

# a built-in type. InfoType names should conform to the pattern

1068

# [a-zA-Z0-9_]{1,64}.

1069

},

1070

"maxFindings": 42, # Max findings limit for the given infoType.

1071

},

1072

],

1073

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

1074

# When set within `InspectDataSourceRequest`,

1075

# the maximum returned is 2000 regardless if this is set higher.

1076

# When set within `InspectContentRequest`, this field is ignored.

1077

},

1078

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

1079

# POSSIBLE.

1080

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

1081

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

1082

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

1083

{ # Custom information type provided by the user. Used to find domain-specific

1084

# sensitive information configurable to the data in question.

1085

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

1086

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1087

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1088

# google/re2 repository on GitHub.

1089

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1090

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

1095

# support reversing.

1096

# such as

1097

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

1098

# These types of transformations are

1099

# those that perform pseudonymization, thereby producing a "surrogate" as

1100

# output. This should be used in conjunction with a field on the

1101

# transformation such as `surrogate_info_type`. This CustomInfoType does

1102

# not support the use of `detection_rules`.

1103

},

1104

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

1105

# infoType, when the name matches one of existing infoTypes and that infoType

1106

# is specified in `InspectContent.info_types` field. Specifying the latter

1107

# adds findings to the one detected by the system. If built-in info type is

1108

# not specified in `InspectContent.info_types` list then the name is treated

1109

# as a custom info type.

1110

"name": "A String", # Name of the information type. Either a name of your choosing when

1111

# creating a CustomInfoType, or one of the names listed

1112

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1113

# a built-in type. InfoType names should conform to the pattern

1114

# [a-zA-Z0-9_]{1,64}.

1115

},

1116

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

1117

# be used to match sensitive information specific to the data, such as a list

1118

# of employee IDs or job titles.

1119

#

1120

# Dictionary words are case-insensitive and all characters other than letters

1121

# and digits in the unicode [Basic Multilingual

1122

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

1123

# will be replaced with whitespace when scanning for matches, so the

1124

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

1125

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

1126

# surrounding any match must be of a different type than the adjacent

1127

# characters within the word, so letters must be next to non-letters and

1128

# digits next to non-digits. For example, the dictionary word "jen" will

1129

# match the first three letters of the text "jen123" but will return no

1130

# matches for "jennifer".

1131

#

1132

# Dictionary words containing a large number of characters that are not

1133

# letters or digits may result in unexpected findings because such characters

1134

# are treated as whitespace. The

1135

# [limits](https://cloud.google.com/dlp/limits) page contains details about

1136

# the size limits of dictionaries. For dictionaries that do not fit within

1137

# these constraints, consider using `LargeCustomDictionaryConfig` in the

1138

# `StoredInfoType` API.

1139

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

1140

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

1141

# at least one phrase and every phrase must contain at least 2 characters

1142

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

1147

# is accepted.

1148

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

1149

# Example: gs://[BUCKET_NAME]/dictionary.txt

1150

},

1151

},

1152

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

1153

# `InspectDataSource`. Not currently supported in `InspectContent`.

1154

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

1155

# `organizations/433245324/storedInfoTypes/432452342` or

1156

# `projects/project-id/storedInfoTypes/432452342`.

1157

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

1158

# inspection was created. Output-only field, populated by the system.

1159

},

1160

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

1161

# Rules are applied in order that they are specified. Not supported for the

1162

# `surrogate_type` CustomInfoType.

1163

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

1164

# `CustomInfoType` to alter behavior under certain circumstances, depending

1165

# on the specific details of the rule. Not supported for the `surrogate_type`

1166

# custom infoType.

1167

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

1168

# proximity of hotwords.

1169

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

1170

# The total length of the window cannot exceed 1000 characters. Note that

1171

# the finding itself will be included in the window, so that hotwords may

1172

# be used to match substrings of the finding itself. For example, the

1173

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

1174

# adjusted upwards if the area code is known to be the local area code of

1175

# a company office using the hotword regex "\(xxx\)", where "xxx"

1176

# is the area code in question.

1177

# rule.

1178

"windowAfter": 42, # Number of characters after the finding to consider.

1179

"windowBefore": 42, # Number of characters before the finding to consider.

1180

},

1181

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

1182

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1183

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1184

# google/re2 repository on GitHub.

1185

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1186

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

1191

# part of a detection rule.

1192

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

1193

# levels. For example, if a finding would be `POSSIBLE` without the

1194

# detection rule and `relative_likelihood` is 1, then it is upgraded to

1195

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

1196

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

1197

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

1198

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

1199

# a final likelihood of `LIKELY`.

1200

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

1206

# to be returned. It still can be used for rules matching.

1207

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

1208

# altered by a detection rule if the finding meets the criteria specified by

1209

# the rule. Defaults to `VERY_LIKELY` if not specified.

1210

},

1211

],

1212

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

1213

# included in the response; see Finding.quote.

1214

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

1215

# Exclusion rules, contained in the set are executed in the end, other

1216

# rules are executed in the order they are specified for each info type.

1217

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

1218

# circumstances, depending on the specific details of the rules within the set.

1219

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

1220

{ # A single inspection rule to be applied to infoTypes, specified in

1221

# `InspectionRuleSet`.

1222

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

1223

# proximity of hotwords.

1224

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

1225

# The total length of the window cannot exceed 1000 characters. Note that

1226

# the finding itself will be included in the window, so that hotwords may

1227

# be used to match substrings of the finding itself. For example, the

1228

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

1229

# adjusted upwards if the area code is known to be the local area code of

1230

# a company office using the hotword regex "\(xxx\)", where "xxx"

1231

# is the area code in question.

1232

# rule.

1233

"windowAfter": 42, # Number of characters after the finding to consider.

1234

"windowBefore": 42, # Number of characters before the finding to consider.

1235

},

1236

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

1237

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1238

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1239

# google/re2 repository on GitHub.

1240

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1241

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

1246

# part of a detection rule.

1247

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

1248

# levels. For example, if a finding would be `POSSIBLE` without the

1249

# detection rule and `relative_likelihood` is 1, then it is upgraded to

1250

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

1251

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

1252

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

1253

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

1254

# a final likelihood of `LIKELY`.

1255

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

1256

},

1257

},

1258

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

1259

# `InspectionRuleSet` are removed from results.

1260

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

1261

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1262

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1263

# google/re2 repository on GitHub.

1264

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1265

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

1270

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

1271

# contained within with a finding of an infoType from this list. For

1272

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

1273

# `exclusion_rule` containing `exclude_info_types.info_types` with

1274

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

1275

# with EMAIL_ADDRESS finding.

1276

# That leads to "555-222-2222@example.org" to generate only a single

1277

# finding, namely email address.

1278

{ # Type of information detected by the API.

1279

"name": "A String", # Name of the information type. Either a name of your choosing when

1280

# creating a CustomInfoType, or one of the names listed

1281

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1282

# a built-in type. InfoType names should conform to the pattern

1283

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

1288

# be used to match sensitive information specific to the data, such as a list

1289

# of employee IDs or job titles.

1290

#

1291

# Dictionary words are case-insensitive and all characters other than letters

1292

# and digits in the unicode [Basic Multilingual

1293

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

1294

# will be replaced with whitespace when scanning for matches, so the

1295

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

1296

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

1297

# surrounding any match must be of a different type than the adjacent

1298

# characters within the word, so letters must be next to non-letters and

1299

# digits next to non-digits. For example, the dictionary word "jen" will

1300

# match the first three letters of the text "jen123" but will return no

1301

# matches for "jennifer".

1302

#

1303

# Dictionary words containing a large number of characters that are not

1304

# letters or digits may result in unexpected findings because such characters

1305

# are treated as whitespace. The

1306

# [limits](https://cloud.google.com/dlp/limits) page contains details about

1307

# the size limits of dictionaries. For dictionaries that do not fit within

1308

# these constraints, consider using `LargeCustomDictionaryConfig` in the

1309

# `StoredInfoType` API.

1310

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

1311

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

1312

# at least one phrase and every phrase must contain at least 2 characters

1313

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

1318

# is accepted.

1319

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

1320

# Example: gs://[BUCKET_NAME]/dictionary.txt

1321

},

1322

},

1323

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

1328

{ # Type of information detected by the API.

1329

"name": "A String", # Name of the information type. Either a name of your choosing when

1330

# creating a CustomInfoType, or one of the names listed

1331

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1332

# a built-in type. InfoType names should conform to the pattern

1333

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

1339

# If empty, text, images, and other content will be included.

1340

"A String",

1341

],

1342

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

1343

# InfoType values returned by ListInfoTypes or listed at

1344

# https://cloud.google.com/dlp/docs/infotypes-reference.

1345

#

1346

# When no InfoTypes or CustomInfoTypes are specified in a request, the

1347

# system may automatically choose what detectors to run. By default this may

1348

# be all types, but may change over time as detectors are updated.

1349

#

1350

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

1351

# but may change over time as new InfoTypes are added. If you need precise

1352

# control and predictability as to what detectors are run you should specify

1353

# specific InfoTypes listed in the reference.

1354

{ # Type of information detected by the API.

1355

"name": "A String", # Name of the information type. Either a name of your choosing when

1356

# creating a CustomInfoType, or one of the names listed

1357

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1358

# a built-in type. InfoType names should conform to the pattern

1359

# [a-zA-Z0-9_]{1,64}.

},

],

},

"createTime": "A String", # The creation timestamp of a inspectTemplate, output only field.

1364

"name": "A String", # The template name. Output only.

1365

#

1366

# The template will have one of the following formats:

1367

# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR

1368

# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`

1369

},

1370

"jobConfig": {

1371

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

1372

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.

1373

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

1374

# by project and namespace, however the namespace ID may be empty.

1375

# A partition ID identifies a grouping of entities. The grouping is always

1376

# by project and namespace, however the namespace ID may be empty.

1377

#

1378

# A partition ID contains several dimensions:

1379

# project ID and namespace ID.

1380

"projectId": "A String", # The ID of the project to which the entities belong.

1381

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

1382

},

1383

"kind": { # A representation of a Datastore kind. # The kind to process.

1384

"name": "A String", # The name of the kind.

1385

},

1386

},

1387

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.

1388

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

1389

# inspection of entire columns which you know have no findings.

1390

{ # General identifier of a data field in a storage service.

1391

"name": "A String", # Name describing the field.

1392

},

1393

],

1394

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

1395

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

1396

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

1397

# Cannot be used in conjunction with TimespanConfig.

1398

"sampleMethod": "A String",

1399

"identifyingFields": [ # References to fields uniquely identifying rows within the table.

1400

# Nested fields in the format, like `person.birthdate.year`, are allowed.

1401

{ # General identifier of a data field in a storage service.

1402

"name": "A String", # Name describing the field.

1403

},

1404

],

1405

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

1406

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

1407

# 100 means no limit. Defaults to 0. Only one of rows_limit and

1408

# rows_limit_percent can be specified. Cannot be used in conjunction with

1409

# TimespanConfig.

1410

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

1411

# identified by its project_id, dataset_id, and table_name. Within a query

1412

# a table is often referenced with a string in the format of:

1413

# `<project_id>:<dataset_id>.<table_id>` or

1414

# `<project_id>.<dataset_id>.<table_id>`.

1415

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

1416

# If omitted, project ID is inferred from the API call.

1417

"tableId": "A String", # Name of the table.

1418

"datasetId": "A String", # Dataset ID of the table.

1419

},

1420

},

1421

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

1422

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

1423

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

1424

# Used for data sources like Datastore or BigQuery.

1425

# If not specified for BigQuery, table last modification timestamp

1426

# is checked against given time span.

1427

# The valid data types of the timestamp field are:

1428

# for BigQuery - timestamp, date, datetime;

1429

# for Datastore - timestamp.

1430

# Datastore entity will be scanned if the timestamp property does not exist

1431

# or its value is empty or invalid.

1432

"name": "A String", # Name describing the field.

1433

},

1434

"endTime": "A String", # Exclude files or rows newer than this value.

1435

# If set to zero, no upper time limit is applied.

1436

"startTime": "A String", # Exclude files or rows older than this value.

1437

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

1438

# a valid start_time to avoid scanning files that have not been modified

1439

# since the last time the JobTrigger executed. This will be based on the

1440

# time of the execution of the last run of the JobTrigger.

1441

},

1442

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.

1443

# bucket.

1444

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

1445

# than this value then the rest of the bytes are omitted. Only one

1446

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

1447

"sampleMethod": "A String",

1448

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

1449

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

1450

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

1451

#

1452

# If the url ends in a trailing slash, the bucket or directory represented

1453

# by the url will be scanned non-recursively (content in sub-directories

1454

# will not be scanned). This means that `gs://mybucket/` is equivalent to

1455

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

1456

# `gs://mybucket/directory/*`.

1457

#

1458

# Exactly one of `url` or `regex_file_set` must be set.

1459

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

1460

# `regex_file_set` must be set.

1461

# expressions are used to allow fine-grained control over which files in the

1462

# bucket to include.

1463

#

1464

# Included files are those that match at least one item in `include_regex` and

1465

# do not match any items in `exclude_regex`. Note that a file that matches

1466

# items from both lists will _not_ be included. For a match to occur, the

1467

# entire file path (i.e., everything in the url after the bucket name) must

1468

# match the regular expression.

1469

#

1470

# For example, given the input `{bucket_name: "mybucket", include_regex:

1471

# ["directory1/.*"], exclude_regex:

1472

# ["directory1/excluded.*"]}`:

1473

#

1474

# * `gs://mybucket/directory1/myfile` will be included

1475

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

1476

# across `/`)

1477

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

1478

# full path doesn't match any items in `include_regex`)

1479

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

1480

# matches an item in `exclude_regex`)

1481

#

1482

# If `include_regex` is left empty, it will match all files by default

1483

# (this is equivalent to setting `include_regex: [".*"]`).

1484

#

1485

# Some other common use cases:

1486

#

1487

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

1488

# files in `mybucket` except for .pdf files

1489

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

1490

# include all files directly under `gs://mybucket/directory/`, without matching

1491

# across `/`

1492

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

1493

# the bucket that match at least one of these regular expressions will be

1494

# excluded from the scan.

1495

#

1496

# Regular expressions use RE2

1497

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

1498

# under the google/re2 repository on GitHub.

1499

"A String",

1500

],

1501

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

1502

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

1503

# the bucket that match at least one of these regular expressions will be

1504

# included in the set of files, except for those that also match an item in

1505

# `exclude_regex`. Leaving this field empty will match all files by default

1506

# (this is equivalent to including `.*` in the list).

1507

#

1508

# Regular expressions use RE2

1509

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

1510

# under the google/re2 repository on GitHub.

"A String",

],

},

},

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

1516

# number of bytes scanned is rounded down. Must be between 0 and 100,

1517

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

1518

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

1519

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

1520

# Number of files scanned is rounded down. Must be between 0 and 100,

1521

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

1522

"fileTypes": [ # List of file type groups to include in the scan.

1523

# If empty, all files are scanned and available data format processors

1524

# are applied. In addition, the binary content of the selected files

1525

# is always scanned as well.

"A String",

],

},

},

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

1531

# When used with redactContent only info_types and min_likelihood are currently

1532

# used.

1533

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

1534

"limits": {

1535

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

1536

# When set within `InspectContentRequest`, the maximum returned is 2000

1537

# regardless if this is set higher.

1538

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

1539

{ # Max findings configuration per infoType, per content item or long

1540

# running DlpJob.

1541

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

1542

# info_type should be provided. If InfoTypeLimit does not have an

1543

# info_type, the DLP API applies the limit against all info_types that

1544

# are found but not specified in another InfoTypeLimit.

1545

"name": "A String", # Name of the information type. Either a name of your choosing when

1546

# creating a CustomInfoType, or one of the names listed

1547

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1548

# a built-in type. InfoType names should conform to the pattern

1549

# [a-zA-Z0-9_]{1,64}.

1550

},

1551

"maxFindings": 42, # Max findings limit for the given infoType.

1552

},

1553

],

1554

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

1555

# When set within `InspectDataSourceRequest`,

1556

# the maximum returned is 2000 regardless if this is set higher.

1557

# When set within `InspectContentRequest`, this field is ignored.

1558

},

1559

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

1560

# POSSIBLE.

1561

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

1562

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

1563

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

1564

{ # Custom information type provided by the user. Used to find domain-specific

1565

# sensitive information configurable to the data in question.

1566

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

1567

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1568

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1569

# google/re2 repository on GitHub.

1570

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1571

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

1576

# support reversing.

1577

# such as

1578

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

1579

# These types of transformations are

1580

# those that perform pseudonymization, thereby producing a "surrogate" as

1581

# output. This should be used in conjunction with a field on the

1582

# transformation such as `surrogate_info_type`. This CustomInfoType does

1583

# not support the use of `detection_rules`.

1584

},

1585

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

1586

# infoType, when the name matches one of existing infoTypes and that infoType

1587

# is specified in `InspectContent.info_types` field. Specifying the latter

1588

# adds findings to the one detected by the system. If built-in info type is

1589

# not specified in `InspectContent.info_types` list then the name is treated

1590

# as a custom info type.

1591

"name": "A String", # Name of the information type. Either a name of your choosing when

1592

# creating a CustomInfoType, or one of the names listed

1593

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1594

# a built-in type. InfoType names should conform to the pattern

1595

# [a-zA-Z0-9_]{1,64}.

1596

},

1597

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

1598

# be used to match sensitive information specific to the data, such as a list

1599

# of employee IDs or job titles.

1600

#

1601

# Dictionary words are case-insensitive and all characters other than letters

1602

# and digits in the unicode [Basic Multilingual

1603

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

1604

# will be replaced with whitespace when scanning for matches, so the

1605

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

1606

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

1607

# surrounding any match must be of a different type than the adjacent

1608

# characters within the word, so letters must be next to non-letters and

1609

# digits next to non-digits. For example, the dictionary word "jen" will

1610

# match the first three letters of the text "jen123" but will return no

1611

# matches for "jennifer".

1612

#

1613

# Dictionary words containing a large number of characters that are not

1614

# letters or digits may result in unexpected findings because such characters

1615

# are treated as whitespace. The

1616

# [limits](https://cloud.google.com/dlp/limits) page contains details about

1617

# the size limits of dictionaries. For dictionaries that do not fit within

1618

# these constraints, consider using `LargeCustomDictionaryConfig` in the

1619

# `StoredInfoType` API.

1620

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

1621

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

1622

# at least one phrase and every phrase must contain at least 2 characters

1623

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

1628

# is accepted.

1629

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

1630

# Example: gs://[BUCKET_NAME]/dictionary.txt

1631

},

1632

},

1633

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

1634

# `InspectDataSource`. Not currently supported in `InspectContent`.

1635

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

1636

# `organizations/433245324/storedInfoTypes/432452342` or

1637

# `projects/project-id/storedInfoTypes/432452342`.

1638

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

1639

# inspection was created. Output-only field, populated by the system.

1640

},

1641

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

1642

# Rules are applied in order that they are specified. Not supported for the

1643

# `surrogate_type` CustomInfoType.

1644

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

1645

# `CustomInfoType` to alter behavior under certain circumstances, depending

1646

# on the specific details of the rule. Not supported for the `surrogate_type`

1647

# custom infoType.

1648

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

1649

# proximity of hotwords.

1650

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

1651

# The total length of the window cannot exceed 1000 characters. Note that

1652

# the finding itself will be included in the window, so that hotwords may

1653

# be used to match substrings of the finding itself. For example, the

1654

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

1655

# adjusted upwards if the area code is known to be the local area code of

1656

# a company office using the hotword regex "\(xxx\)", where "xxx"

1657

# is the area code in question.

1658

# rule.

1659

"windowAfter": 42, # Number of characters after the finding to consider.

1660

"windowBefore": 42, # Number of characters before the finding to consider.

1661

},

1662

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

1663

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1664

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1665

# google/re2 repository on GitHub.

1666

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1667

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

1672

# part of a detection rule.

1673

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

1674

# levels. For example, if a finding would be `POSSIBLE` without the

1675

# detection rule and `relative_likelihood` is 1, then it is upgraded to

1676

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

1677

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

1678

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

1679

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

1680

# a final likelihood of `LIKELY`.

1681

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

1687

# to be returned. It still can be used for rules matching.

1688

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

1689

# altered by a detection rule if the finding meets the criteria specified by

1690

# the rule. Defaults to `VERY_LIKELY` if not specified.

1691

},

1692

],

1693

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

1694

# included in the response; see Finding.quote.

1695

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

1696

# Exclusion rules, contained in the set are executed in the end, other

1697

# rules are executed in the order they are specified for each info type.

1698

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

1699

# circumstances, depending on the specific details of the rules within the set.

1700

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

1701

{ # A single inspection rule to be applied to infoTypes, specified in

1702

# `InspectionRuleSet`.

1703

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

1704

# proximity of hotwords.

1705

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

1706

# The total length of the window cannot exceed 1000 characters. Note that

1707

# the finding itself will be included in the window, so that hotwords may

1708

# be used to match substrings of the finding itself. For example, the

1709

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

1710

# adjusted upwards if the area code is known to be the local area code of

1711

# a company office using the hotword regex "\(xxx\)", where "xxx"

1712

# is the area code in question.

1713

# rule.

1714

"windowAfter": 42, # Number of characters after the finding to consider.

1715

"windowBefore": 42, # Number of characters before the finding to consider.

1716

},

1717

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

1718

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1719

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1720

# google/re2 repository on GitHub.

1721

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1722

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

1727

# part of a detection rule.

1728

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

1729

# levels. For example, if a finding would be `POSSIBLE` without the

1730

# detection rule and `relative_likelihood` is 1, then it is upgraded to

1731

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

1732

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

1733

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

1734

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

1735

# a final likelihood of `LIKELY`.

1736

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

1737

},

1738

},

1739

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

1740

# `InspectionRuleSet` are removed from results.

1741

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

1742

"pattern": "A String", # Pattern defining the regular expression. Its syntax

1743

# (https://github.com/google/re2/wiki/Syntax) can be found under the

1744

# google/re2 repository on GitHub.

1745

"groupIndexes": [ # The index of the submatch to extract as findings. When not

1746

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

1751

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

1752

# contained within with a finding of an infoType from this list. For

1753

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

1754

# `exclusion_rule` containing `exclude_info_types.info_types` with

1755

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

1756

# with EMAIL_ADDRESS finding.

1757

# That leads to "555-222-2222@example.org" to generate only a single

1758

# finding, namely email address.

1759

{ # Type of information detected by the API.

1760

"name": "A String", # Name of the information type. Either a name of your choosing when

1761

# creating a CustomInfoType, or one of the names listed

1762

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1763

# a built-in type. InfoType names should conform to the pattern

1764

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

1769

# be used to match sensitive information specific to the data, such as a list

1770

# of employee IDs or job titles.

1771

#

1772

# Dictionary words are case-insensitive and all characters other than letters

1773

# and digits in the unicode [Basic Multilingual

1774

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

1775

# will be replaced with whitespace when scanning for matches, so the

1776

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

1777

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

1778

# surrounding any match must be of a different type than the adjacent

1779

# characters within the word, so letters must be next to non-letters and

1780

# digits next to non-digits. For example, the dictionary word "jen" will

1781

# match the first three letters of the text "jen123" but will return no

1782

# matches for "jennifer".

1783

#

1784

# Dictionary words containing a large number of characters that are not

1785

# letters or digits may result in unexpected findings because such characters

1786

# are treated as whitespace. The

1787

# [limits](https://cloud.google.com/dlp/limits) page contains details about

1788

# the size limits of dictionaries. For dictionaries that do not fit within

1789

# these constraints, consider using `LargeCustomDictionaryConfig` in the

1790

# `StoredInfoType` API.

1791

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

1792

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

1793

# at least one phrase and every phrase must contain at least 2 characters

1794

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

1799

# is accepted.

1800

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

1801

# Example: gs://[BUCKET_NAME]/dictionary.txt

1802

},

1803

},

1804

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

1809

{ # Type of information detected by the API.

1810

"name": "A String", # Name of the information type. Either a name of your choosing when

1811

# creating a CustomInfoType, or one of the names listed

1812

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1813

# a built-in type. InfoType names should conform to the pattern

1814

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

1820

# If empty, text, images, and other content will be included.

1821

"A String",

1822

],

1823

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

1824

# InfoType values returned by ListInfoTypes or listed at

1825

# https://cloud.google.com/dlp/docs/infotypes-reference.

1826

#

1827

# When no InfoTypes or CustomInfoTypes are specified in a request, the

1828

# system may automatically choose what detectors to run. By default this may

1829

# be all types, but may change over time as detectors are updated.

1830

#

1831

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

1832

# but may change over time as new InfoTypes are added. If you need precise

1833

# control and predictability as to what detectors are run you should specify

1834

# specific InfoTypes listed in the reference.

1835

{ # Type of information detected by the API.

1836

"name": "A String", # Name of the information type. Either a name of your choosing when

1837

# creating a CustomInfoType, or one of the names listed

1838

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1839

# a built-in type. InfoType names should conform to the pattern

1840

# [a-zA-Z0-9_]{1,64}.

},

],

},

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

1845

# `inspect_config` will be merged into the values persisted as part of the

1846

# template.

1847

"actions": [ # Actions to execute at the completion of the job.

1848

{ # A task to execute on the completion of a job.

1849

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

1850

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

1851

# OutputStorageConfig. Only a single instance of this action can be

1852

# specified.

1853

# Compatible with: Inspect, Risk

1854

"outputConfig": { # Cloud repository for storing output.

1855

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

1856

# dataset. If table_id is not set a new one will be generated

1857

# for you with the following format:

1858

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

1859

# generating the date details.

1860

#

1861

# For Inspect, each column in an existing output table must have the same

1862

# name, type, and mode of a field in the `Finding` object.

1863

#

1864

# For Risk, an existing output table should be the output of a previous

1865

# Risk analysis job run on the same source table, with the same privacy

1866

# metric and quasi-identifiers. Risk jobs that analyze the same table but

1867

# compute a different privacy metric, or use different sets of

1868

# quasi-identifiers, cannot store their results in the same table.

1869

# identified by its project_id, dataset_id, and table_name. Within a query

1870

# a table is often referenced with a string in the format of:

1871

# `<project_id>:<dataset_id>.<table_id>` or

1872

# `<project_id>.<dataset_id>.<table_id>`.

1873

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

1874

# If omitted, project ID is inferred from the API call.

1875

"tableId": "A String", # Name of the table.

1876

"datasetId": "A String", # Dataset ID of the table.

1877

},

1878

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

1879

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

1880

# from the `Finding` object. If appending to an existing table, any columns

1881

# from the predefined schema that are missing will be added. No columns in

1882

# the existing table will be deleted.

1883

#

1884

# If unspecified, then all available columns will be used for a new table or

1885

# an (existing) table with no schema, and no changes will be made to an

1886

# existing table that has a schema.

1887

},

1888

},

1889

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's

1890

# completion/failure.

1891

# completion/failure.

1892

},

1893

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

1894

# Command Center (CSCC Alpha).

1895

# This action is only available for projects which are parts of

1896

# an organization and whitelisted for the alpha Cloud Security Command

1897

# Center.

1898

# The action will publish count of finding instances and their info types.

1899

# The summary of findings will be persisted in CSCC and are governed by CSCC

1900

# service-specific policy, see https://cloud.google.com/terms/service-terms

1901

# Only a single instance of this action can be specified.

1902

# Compatible with: Inspect

1903

},

1904

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

1905

# message contains a single field, `DlpJobName`, which is equal to the

1906

# finished job's

1907

# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

1908

# Compatible with: Inspect, Risk

1909

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

1910

# publishing access rights to the DLP API service account executing

1911

# the long running DlpJob sending the notifications.

1912

# Format is projects/{project}/topics/{topic}.

},

},

],

},

},

"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.

1919

"infoTypeStats": [ # Statistics of how many instances of each info type were found during

1920

# inspect job.

1921

{ # Statistics regarding a specific InfoType.

1922

"count": "A String", # Number of findings for this infoType.

1923

"infoType": { # Type of information detected by the API. # The type of finding this stat is for.

1924

"name": "A String", # Name of the information type. Either a name of your choosing when

1925

# creating a CustomInfoType, or one of the names listed

1926

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

1927

# a built-in type. InfoType names should conform to the pattern

1928

# [a-zA-Z0-9_]{1,64}.

},

},

],

"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.

1933

"processedBytes": "A String", # Total size in bytes that were processed.

1934

},

1935

},

1936

"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.

1937

"numericalStatsResult": { # Result of the numerical stats computation.

1938

"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal

1939

# sized buckets.

1940

{ # Set of primitive values supported by the system.

1941

# Note that for the purposes of inspection or transformation, the number

1942

# of bytes considered to comprise a 'Value' is based on its representation

1943

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1944

# 123456789, the number of bytes would be counted as 9, even though an

1945

# int64 only holds up to 8 bytes of data.

1946

"floatValue": 3.14,

1947

"timestampValue": "A String",

1948

"dayOfWeekValue": "A String",

1949

"timeValue": { # Represents a time of day. The date and time zone are either not significant

1950

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1951

# types are google.type.Date and `google.protobuf.Timestamp`.

1952

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1953

# to allow the value "24:00:00" for scenarios like business closing time.

1954

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1955

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1956

# allow the value 60 if it allows leap-seconds.

1957

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

1958

},

1959

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

1960

# and time zone are either specified elsewhere or are not significant. The date

1961

# is relative to the Proleptic Gregorian Calendar. This can represent:

1962

#

1963

# * A full date, with non-zero year, month and day values

1964

# * A month and day value, with a zero year, e.g. an anniversary

1965

# * A year on its own, with zero month and day values

1966

# * A year and month value, with a zero day, e.g. a credit card expiration date

1967

#

1968

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

1969

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

1970

# a year.

1971

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

1972

# if specifying a year by itself or a year and month where the day is not

1973

# significant.

1974

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

1975

# month and day.

1976

},

1977

"stringValue": "A String",

1978

"booleanValue": True or False,

1979

"integerValue": "A String",

1980

},

1981

],

1982

"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.

1983

# Note that for the purposes of inspection or transformation, the number

1984

# of bytes considered to comprise a 'Value' is based on its representation

1985

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

1986

# 123456789, the number of bytes would be counted as 9, even though an

1987

# int64 only holds up to 8 bytes of data.

1988

"floatValue": 3.14,

1989

"timestampValue": "A String",

1990

"dayOfWeekValue": "A String",

1991

"timeValue": { # Represents a time of day. The date and time zone are either not significant

1992

# or are specified elsewhere. An API may choose to allow leap seconds. Related

1993

# types are google.type.Date and `google.protobuf.Timestamp`.

1994

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

1995

# to allow the value "24:00:00" for scenarios like business closing time.

1996

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

1997

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

1998

# allow the value 60 if it allows leap-seconds.

1999

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2000

},

2001

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2002

# and time zone are either specified elsewhere or are not significant. The date

2003

# is relative to the Proleptic Gregorian Calendar. This can represent:

2004

#

2005

# * A full date, with non-zero year, month and day values

2006

# * A month and day value, with a zero year, e.g. an anniversary

2007

# * A year on its own, with zero month and day values

2008

# * A year and month value, with a zero day, e.g. a credit card expiration date

2009

#

2010

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2011

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2012

# a year.

2013

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2014

# if specifying a year by itself or a year and month where the day is not

2015

# significant.

2016

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2017

# month and day.

2018

},

2019

"stringValue": "A String",

2020

"booleanValue": True or False,

2021

"integerValue": "A String",

2022

},

2023

"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.

2024

# Note that for the purposes of inspection or transformation, the number

2025

# of bytes considered to comprise a 'Value' is based on its representation

2026

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2027

# 123456789, the number of bytes would be counted as 9, even though an

2028

# int64 only holds up to 8 bytes of data.

2029

"floatValue": 3.14,

2030

"timestampValue": "A String",

2031

"dayOfWeekValue": "A String",

2032

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2033

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2034

# types are google.type.Date and `google.protobuf.Timestamp`.

2035

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2036

# to allow the value "24:00:00" for scenarios like business closing time.

2037

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2038

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2039

# allow the value 60 if it allows leap-seconds.

2040

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2041

},

2042

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2043

# and time zone are either specified elsewhere or are not significant. The date

2044

# is relative to the Proleptic Gregorian Calendar. This can represent:

2045

#

2046

# * A full date, with non-zero year, month and day values

2047

# * A month and day value, with a zero year, e.g. an anniversary

2048

# * A year on its own, with zero month and day values

2049

# * A year and month value, with a zero day, e.g. a credit card expiration date

2050

#

2051

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2052

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2053

# a year.

2054

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2055

# if specifying a year by itself or a year and month where the day is not

2056

# significant.

2057

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2058

# month and day.

2059

},

2060

"stringValue": "A String",

2061

"booleanValue": True or False,

2062

"integerValue": "A String",

2063

},

2064

},

2065

"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an

2066

# estimation, not exact values.

2067

"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value

2068

# doesn't correspond to any such interval, the associated frequency is

2069

# zero. For example, the following records:

2070

# {min_anonymity: 1, max_anonymity: 1, frequency: 17}

2071

# {min_anonymity: 2, max_anonymity: 3, frequency: 42}

2072

# {min_anonymity: 5, max_anonymity: 10, frequency: 99}

2073

# mean that there are no record with an estimated anonymity of 4, 5, or

2074

# larger than 10.

2075

{ # A KMapEstimationHistogramBucket message with the following values:

# min_anonymity: 3

# max_anonymity: 5

# frequency: 42

# means that there are 42 records whose quasi-identifier values correspond

2080

# to 3, 4 or 5 people in the overlying population. An important particular

2081

# case is when min_anonymity = max_anonymity = 1: the frequency field then

2082

# corresponds to the number of uniquely identifiable records.

2083

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

2084

# number of classes returned per bucket is capped at 20.

2085

{ # A tuple of values for the quasi-identifier columns.

2086

"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.

2087

"quasiIdsValues": [ # The quasi-identifier values.

2088

{ # Set of primitive values supported by the system.

2089

# Note that for the purposes of inspection or transformation, the number

2090

# of bytes considered to comprise a 'Value' is based on its representation

2091

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2092

# 123456789, the number of bytes would be counted as 9, even though an

2093

# int64 only holds up to 8 bytes of data.

2094

"floatValue": 3.14,

2095

"timestampValue": "A String",

2096

"dayOfWeekValue": "A String",

2097

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2098

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2099

# types are google.type.Date and `google.protobuf.Timestamp`.

2100

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2101

# to allow the value "24:00:00" for scenarios like business closing time.

2102

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2103

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2104

# allow the value 60 if it allows leap-seconds.

2105

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2106

},

2107

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2108

# and time zone are either specified elsewhere or are not significant. The date

2109

# is relative to the Proleptic Gregorian Calendar. This can represent:

2110

#

2111

# * A full date, with non-zero year, month and day values

2112

# * A month and day value, with a zero year, e.g. an anniversary

2113

# * A year on its own, with zero month and day values

2114

# * A year and month value, with a zero day, e.g. a credit card expiration date

2115

#

2116

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2117

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2118

# a year.

2119

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2120

# if specifying a year by itself or a year and month where the day is not

2121

# significant.

2122

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2123

# month and day.

2124

},

2125

"stringValue": "A String",

2126

"booleanValue": True or False,

2127

"integerValue": "A String",

},

],

},

],

"minAnonymity": "A String", # Always positive.

2133

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

2134

"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.

2135

"bucketSize": "A String", # Number of records within these anonymity bounds.

},

],

},

"kAnonymityResult": { # Result of the k-anonymity computation.

2140

"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.

2141

{

2142

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

2143

# classes returned per bucket is capped at 20.

2144

{ # The set of columns' values that share the same ldiversity value

2145

"quasiIdsValues": [ # Set of values defining the equivalence class. One value per

2146

# quasi-identifier column in the original KAnonymity metric message.

2147

# The order is always the same as the original request.

2148

{ # Set of primitive values supported by the system.

2149

# Note that for the purposes of inspection or transformation, the number

2150

# of bytes considered to comprise a 'Value' is based on its representation

2151

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2152

# 123456789, the number of bytes would be counted as 9, even though an

2153

# int64 only holds up to 8 bytes of data.

2154

"floatValue": 3.14,

2155

"timestampValue": "A String",

2156

"dayOfWeekValue": "A String",

2157

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2158

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2159

# types are google.type.Date and `google.protobuf.Timestamp`.

2160

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2161

# to allow the value "24:00:00" for scenarios like business closing time.

2162

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2163

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2164

# allow the value 60 if it allows leap-seconds.

2165

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2166

},

2167

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2168

# and time zone are either specified elsewhere or are not significant. The date

2169

# is relative to the Proleptic Gregorian Calendar. This can represent:

2170

#

2171

# * A full date, with non-zero year, month and day values

2172

# * A month and day value, with a zero year, e.g. an anniversary

2173

# * A year on its own, with zero month and day values

2174

# * A year and month value, with a zero day, e.g. a credit card expiration date

2175

#

2176

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2177

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2178

# a year.

2179

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2180

# if specifying a year by itself or a year and month where the day is not

2181

# significant.

2182

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2183

# month and day.

2184

},

2185

"stringValue": "A String",

2186

"booleanValue": True or False,

2187

"integerValue": "A String",

2188

},

2189

],

2190

"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the

2191

# above set of values.

2192

},

2193

],

2194

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

2195

"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.

2196

"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.

2197

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

},

],

},

"lDiversityResult": { # Result of the l-diversity computation.

2202

"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.

2203

{

2204

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

2205

# classes returned per bucket is capped at 20.

2206

{ # The set of columns' values that share the same ldiversity value.

2207

"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.

2208

"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence

2209

# class. The order is always the same as the original request.

2210

{ # Set of primitive values supported by the system.

2211

# Note that for the purposes of inspection or transformation, the number

2212

# of bytes considered to comprise a 'Value' is based on its representation

2213

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2214

# 123456789, the number of bytes would be counted as 9, even though an

2215

# int64 only holds up to 8 bytes of data.

2216

"floatValue": 3.14,

2217

"timestampValue": "A String",

2218

"dayOfWeekValue": "A String",

2219

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2220

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2221

# types are google.type.Date and `google.protobuf.Timestamp`.

2222

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2223

# to allow the value "24:00:00" for scenarios like business closing time.

2224

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2225

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2226

# allow the value 60 if it allows leap-seconds.

2227

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2228

},

2229

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2230

# and time zone are either specified elsewhere or are not significant. The date

2231

# is relative to the Proleptic Gregorian Calendar. This can represent:

2232

#

2233

# * A full date, with non-zero year, month and day values

2234

# * A month and day value, with a zero year, e.g. an anniversary

2235

# * A year on its own, with zero month and day values

2236

# * A year and month value, with a zero day, e.g. a credit card expiration date

2237

#

2238

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2239

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2240

# a year.

2241

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2242

# if specifying a year by itself or a year and month where the day is not

2243

# significant.

2244

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2245

# month and day.

2246

},

2247

"stringValue": "A String",

2248

"booleanValue": True or False,

2249

"integerValue": "A String",

2250

},

2251

],

2252

"topSensitiveValues": [ # Estimated frequencies of top sensitive values.

2253

{ # A value of a field, including its frequency.

2254

"count": "A String", # How many times the value is contained in the field.

2255

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

2256

# Note that for the purposes of inspection or transformation, the number

2257

# of bytes considered to comprise a 'Value' is based on its representation

2258

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2259

# 123456789, the number of bytes would be counted as 9, even though an

2260

# int64 only holds up to 8 bytes of data.

2261

"floatValue": 3.14,

2262

"timestampValue": "A String",

2263

"dayOfWeekValue": "A String",

2264

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2265

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2266

# types are google.type.Date and `google.protobuf.Timestamp`.

2267

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2268

# to allow the value "24:00:00" for scenarios like business closing time.

2269

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2270

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2271

# allow the value 60 if it allows leap-seconds.

2272

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2273

},

2274

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2275

# and time zone are either specified elsewhere or are not significant. The date

2276

# is relative to the Proleptic Gregorian Calendar. This can represent:

2277

#

2278

# * A full date, with non-zero year, month and day values

2279

# * A month and day value, with a zero year, e.g. an anniversary

2280

# * A year on its own, with zero month and day values

2281

# * A year and month value, with a zero day, e.g. a credit card expiration date

2282

#

2283

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2284

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2285

# a year.

2286

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2287

# if specifying a year by itself or a year and month where the day is not

2288

# significant.

2289

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2290

# month and day.

2291

},

2292

"stringValue": "A String",

2293

"booleanValue": True or False,

2294

"integerValue": "A String",

},

},

],

"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.

2299

},

2300

],

2301

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

2302

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

2303

"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence

2304

# classes in this bucket.

2305

"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence

2306

# classes in this bucket.

},

],

},

"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

2311

"numericalStatsConfig": { # Compute numerical stats over an individual column, including

2312

# min, max, and quantiles.

2313

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

2314

# integer, float, date, datetime, timestamp, time.

2315

"name": "A String", # Name describing the field.

2316

},

2317

},

2318

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what

2319

# is called "journalist risk" in the literature, except the attack dataset is

2320

# statistically modeled instead of being perfectly known. This can be done

2321

# using publicly available data (like the US Census), or using a custom

2322

# statistical model (indicated as one or several BigQuery tables), or by

2323

# extrapolating from the distribution of values in the input dataset.

2324

# A column with a semantic tag attached.

2325

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

2326

# Required if no column is tagged with a region-specific InfoType (like

2327

# US_ZIP_5) or a region code.

2328

"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the

2329

# same tag. [required]

2330

{

2331

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

2332

"name": "A String", # Name describing the field.

2333

},

2334

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

2335

# indicate an auxiliary table that contains statistical information on

2336

# the possible values of this column (below).

2337

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

2338

# dataset as a statistical model of population, if available. We

2339

# currently support US ZIP codes, region codes, ages and genders.

2340

# To programmatically obtain the list of supported InfoTypes, use

2341

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

2342

"name": "A String", # Name of the information type. Either a name of your choosing when

2343

# creating a CustomInfoType, or one of the names listed

2344

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2345

# a built-in type. InfoType names should conform to the pattern

2346

# [a-zA-Z0-9_]{1,64}.

2347

},

2348

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

2349

# the distribution of values in the input data

2350

# empty messages in your APIs. A typical example is to use it as the request

2351

# or the response type of an API method. For instance:

2352

#

2353

# service Foo {

2354

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

2355

# }

2356

#

2357

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

2362

# used to tag a quasi-identifiers column must appear in exactly one column

2363

# of one auxiliary table.

2364

{ # An auxiliary table contains statistical information on the relative

2365

# frequency of different quasi-identifiers values. It has one or several

2366

# quasi-identifiers columns, and one column that indicates the relative

2367

# frequency of each quasi-identifier tuple.

2368

# If a tuple is present in the data but not in the auxiliary table, the

2369

# corresponding relative frequency is assumed to be zero (and thus, the

2370

# tuple is highly reidentifiable).

2371

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

2372

# between 0 and 1 (inclusive). Null values are assumed to be zero.

2373

# [required]

2374

"name": "A String", # Name describing the field.

2375

},

2376

"quasiIds": [ # Quasi-identifier columns. [required]

2377

{ # A quasi-identifier column has a custom_tag, used to know which column

2378

# in the data corresponds to which column in the statistical model.

2379

"field": { # General identifier of a data field in a storage service.

2380

"name": "A String", # Name describing the field.

2381

},

2382

"customTag": "A String",

2383

},

2384

],

2385

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

2386

# identified by its project_id, dataset_id, and table_name. Within a query

2387

# a table is often referenced with a string in the format of:

2388

# `<project_id>:<dataset_id>.<table_id>` or

2389

# `<project_id>.<dataset_id>.<table_id>`.

2390

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

2391

# If omitted, project ID is inferred from the API call.

2392

"tableId": "A String", # Name of the table.

2393

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.

2399

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

2400

"name": "A String", # Name describing the field.

2401

},

2402

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

2403

# defined for the l-diversity computation. When multiple fields are

2404

# specified, they are considered a single composite key.

2405

{ # General identifier of a data field in a storage service.

2406

"name": "A String", # Name describing the field.

},

],

},

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to

2411

# figure out that one given individual appears in a de-identified dataset.

2412

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

2413

# knowing the attack dataset, so we use a statistical model instead.

2414

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

2415

# Required if no column is tagged with a region-specific InfoType (like

2416

# US_ZIP_5) or a region code.

2417

"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the

2418

# same tag. [required]

2419

{ # A column with a semantic tag attached.

2420

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

2421

"name": "A String", # Name describing the field.

2422

},

2423

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

2424

# indicate an auxiliary table that contains statistical information on

2425

# the possible values of this column (below).

2426

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

2427

# dataset as a statistical model of population, if available. We

2428

# currently support US ZIP codes, region codes, ages and genders.

2429

# To programmatically obtain the list of supported InfoTypes, use

2430

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

2431

"name": "A String", # Name of the information type. Either a name of your choosing when

2432

# creating a CustomInfoType, or one of the names listed

2433

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2434

# a built-in type. InfoType names should conform to the pattern

2435

# [a-zA-Z0-9_]{1,64}.

2436

},

2437

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

2438

# the distribution of values in the input data

2439

# empty messages in your APIs. A typical example is to use it as the request

2440

# or the response type of an API method. For instance:

2441

#

2442

# service Foo {

2443

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

2444

# }

2445

#

2446

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

2451

# used to tag a quasi-identifiers field must appear in exactly one

2452

# field of one auxiliary table.

2453

{ # An auxiliary table containing statistical information on the relative

2454

# frequency of different quasi-identifiers values. It has one or several

2455

# quasi-identifiers columns, and one column that indicates the relative

2456

# frequency of each quasi-identifier tuple.

2457

# If a tuple is present in the data but not in the auxiliary table, the

2458

# corresponding relative frequency is assumed to be zero (and thus, the

2459

# tuple is highly reidentifiable).

2460

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

2461

# between 0 and 1 (inclusive). Null values are assumed to be zero.

2462

# [required]

2463

"name": "A String", # Name describing the field.

2464

},

2465

"quasiIds": [ # Quasi-identifier columns. [required]

2466

{ # A quasi-identifier column has a custom_tag, used to know which column

2467

# in the data corresponds to which column in the statistical model.

2468

"field": { # General identifier of a data field in a storage service.

2469

"name": "A String", # Name describing the field.

2470

},

2471

"customTag": "A String",

2472

},

2473

],

2474

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

2475

# identified by its project_id, dataset_id, and table_name. Within a query

2476

# a table is often referenced with a string in the format of:

2477

# `<project_id>:<dataset_id>.<table_id>` or

2478

# `<project_id>.<dataset_id>.<table_id>`.

2479

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

2480

# If omitted, project ID is inferred from the API call.

2481

"tableId": "A String", # Name of the table.

2482

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including

2488

# number of distinct values and value count distribution.

2489

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

2490

# supported except for arrays and structs. However, it may be more

2491

# informative to use NumericalStats when the field type is supported,

2492

# depending on the data.

2493

"name": "A String", # Name describing the field.

2494

},

2495

},

2496

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.

2497

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a

2498

# single individual. If the same entity_id is associated to multiple

2499

# quasi-identifier tuples over distinct rows, we consider the entire

2500

# collection of tuples as the composite quasi-identifier. This collection

2501

# is a multiset: the order in which the different tuples appear in the

2502

# dataset is ignored, but their frequency is taken into account.

2503

#

2504

# Important note: a maximum of 1000 rows can be associated to a single

2505

# entity ID. If more rows are associated with the same entity ID, some

2506

# might be ignored.

2507

# single person. For example, in medical records the `EntityId` might be a

2508

# patient identifier, or for financial records it might be an account

2509

# identifier. This message is used when generalizations or analysis must take

2510

# into account that multiple rows correspond to the same entity.

2511

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

2512

"name": "A String", # Name describing the field.

2513

},

2514

},

2515

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

2516

# specified, they are considered a single composite key. Structs and

2517

# repeated data types are not supported; however, nested fields are

2518

# supported so long as they are not structs themselves or nested within

2519

# a repeated field.

2520

{ # General identifier of a data field in a storage service.

2521

"name": "A String", # Name describing the field.

},

],

},

},

"categoricalStatsResult": { # Result of the categorical stats computation.

2527

"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.

2528

{

2529

"bucketValues": [ # Sample of value frequencies in this bucket. The total number of

2530

# values returned per bucket is capped at 20.

2531

{ # A value of a field, including its frequency.

2532

"count": "A String", # How many times the value is contained in the field.

2533

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

2534

# Note that for the purposes of inspection or transformation, the number

2535

# of bytes considered to comprise a 'Value' is based on its representation

2536

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2537

# 123456789, the number of bytes would be counted as 9, even though an

2538

# int64 only holds up to 8 bytes of data.

2539

"floatValue": 3.14,

2540

"timestampValue": "A String",

2541

"dayOfWeekValue": "A String",

2542

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2543

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2544

# types are google.type.Date and `google.protobuf.Timestamp`.

2545

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2546

# to allow the value "24:00:00" for scenarios like business closing time.

2547

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2548

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2549

# allow the value 60 if it allows leap-seconds.

2550

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2551

},

2552

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2553

# and time zone are either specified elsewhere or are not significant. The date

2554

# is relative to the Proleptic Gregorian Calendar. This can represent:

2555

#

2556

# * A full date, with non-zero year, month and day values

2557

# * A month and day value, with a zero year, e.g. an anniversary

2558

# * A year on its own, with zero month and day values

2559

# * A year and month value, with a zero day, e.g. a credit card expiration date

2560

#

2561

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2562

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2563

# a year.

2564

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2565

# if specifying a year by itself or a year and month where the day is not

2566

# significant.

2567

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2568

# month and day.

2569

},

2570

"stringValue": "A String",

2571

"booleanValue": True or False,

2572

"integerValue": "A String",

},

},

],

"bucketValueCount": "A String", # Total number of distinct values in this bucket.

2577

"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.

2578

"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.

2579

"bucketSize": "A String", # Total number of values in this bucket.

},

],

},

"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an

2584

# estimation, not exact values.

2585

"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a

2586

# value doesn't correspond to any such interval, the associated frequency

2587

# is zero. For example, the following records:

2588

# {min_probability: 0, max_probability: 0.1, frequency: 17}

2589

# {min_probability: 0.2, max_probability: 0.3, frequency: 42}

2590

# {min_probability: 0.3, max_probability: 0.4, frequency: 99}

2591

# mean that there are no record with an estimated probability in [0.1, 0.2)

2592

# nor larger or equal to 0.4.

2593

{ # A DeltaPresenceEstimationHistogramBucket message with the following

2594

# values:

2595

# min_probability: 0.1

2596

# max_probability: 0.2

2597

# frequency: 42

2598

# means that there are 42 records for which δ is in [0.1, 0.2). An

2599

# important particular case is when min_probability = max_probability = 1:

2600

# then, every individual who shares this quasi-identifier combination is in

2601

# the dataset.

2602

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

2603

# number of classes returned per bucket is capped at 20.

2604

{ # A tuple of values for the quasi-identifier columns.

2605

"quasiIdsValues": [ # The quasi-identifier values.

2606

{ # Set of primitive values supported by the system.

2607

# Note that for the purposes of inspection or transformation, the number

2608

# of bytes considered to comprise a 'Value' is based on its representation

2609

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

2610

# 123456789, the number of bytes would be counted as 9, even though an

2611

# int64 only holds up to 8 bytes of data.

2612

"floatValue": 3.14,

2613

"timestampValue": "A String",

2614

"dayOfWeekValue": "A String",

2615

"timeValue": { # Represents a time of day. The date and time zone are either not significant

2616

# or are specified elsewhere. An API may choose to allow leap seconds. Related

2617

# types are google.type.Date and `google.protobuf.Timestamp`.

2618

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

2619

# to allow the value "24:00:00" for scenarios like business closing time.

2620

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

2621

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

2622

# allow the value 60 if it allows leap-seconds.

2623

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

2624

},

2625

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

2626

# and time zone are either specified elsewhere or are not significant. The date

2627

# is relative to the Proleptic Gregorian Calendar. This can represent:

2628

#

2629

# * A full date, with non-zero year, month and day values

2630

# * A month and day value, with a zero year, e.g. an anniversary

2631

# * A year on its own, with zero month and day values

2632

# * A year and month value, with a zero day, e.g. a credit card expiration date

2633

#

2634

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

2635

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

2636

# a year.

2637

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

2638

# if specifying a year by itself or a year and month where the day is not

2639

# significant.

2640

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

2641

# month and day.

2642

},

2643

"stringValue": "A String",

2644

"booleanValue": True or False,

2645

"integerValue": "A String",

2646

},

2647

],

2648

"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these

2649

# quasi-identifier values is in the dataset. This value, typically called

2650

# δ, is the ratio between the number of records in the dataset with these

2651

# quasi-identifier values, and the total number of individuals (inside

2652

# *and* outside the dataset) with these quasi-identifier values.

2653

# For example, if there are 15 individuals in the dataset who share the

2654

# same quasi-identifier values, and an estimated 100 people in the entire

2655

# population with these values, then δ is 0.15.

2656

},

2657

],

2658

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

2659

"bucketSize": "A String", # Number of records within these probability bounds.

2660

"maxProbability": 3.14, # Always greater than or equal to min_probability.

2661

"minProbability": 3.14, # Between 0 and 1.

},

],

},

"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

2666

# identified by its project_id, dataset_id, and table_name. Within a query

2667

# a table is often referenced with a string in the format of:

2668

# `<project_id>:<dataset_id>.<table_id>` or

2669

# `<project_id>.<dataset_id>.<table_id>`.

2670

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

2671

# If omitted, project ID is inferred from the API call.

2672

"tableId": "A String", # Name of the table.

2673

"datasetId": "A String", # Dataset ID of the table.

2674

},

2675

},

2676

"state": "A String", # State of a job.

2677

"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that

2678

# instantiated the job.

2679

"startTime": "A String", # Time when the job started.

2680

"endTime": "A String", # Time when the job finished.

2681

"type": "A String", # The type of job.

2682

"createTime": "A String", # Time when the job was created.

}</pre>

</div>

<code class="details" id="delete">delete(name, x__xgafv=None)</code>

2688

<pre>Deletes a long-running DlpJob. This method indicates that the client is

2689

no longer interested in the DlpJob result. The job will be cancelled if

2690

possible.

2691

See https://cloud.google.com/dlp/docs/inspecting-storage and

2692

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

Args:

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2703

2704

{ # A generic empty message that you can re-use to avoid defining duplicated

2705

# empty messages in your APIs. A typical example is to use it as the request

2706

# or the response type of an API method. For instance:

2707

#

2708

# service Foo {

2709

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

2710

# }

2711

#

2712

# The JSON representation for `Empty` is empty JSON object `{}`.

}</pre>

</div>

<code class="details" id="get">get(name, x__xgafv=None)</code>

2718

<pre>Gets the latest state of a long-running DlpJob.

2719

See https://cloud.google.com/dlp/docs/inspecting-storage and

2720

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

Args:

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

Returns:

An object of the form:

2731

2732

{ # Combines all of the information about a DLP job.

2733

"errors": [ # A stream of errors encountered running the job.

2734

{ # Details information about an error encountered during job execution or

2735

# the results of an unsuccessful activation of the JobTrigger.

2736

# Output only field.

2737

"timestamps": [ # The times the error occurred.

2738

"A String",

2739

],

2740

"details": { # The `Status` type defines a logical error model that is suitable for

2741

# different programming environments, including REST APIs and RPC APIs. It is

2742

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

2743

# three pieces of data: error code, error message, and error details.

2744

#

2745

# You can find out more about this error model and how to work with it in the

2746

# [API Design Guide](https://cloud.google.com/apis/design/errors).

2747

"message": "A String", # A developer-facing error message, which should be in English. Any

2748

# user-facing error message should be localized and sent in the

2749

# google.rpc.Status.details field, or localized by the client.

2750

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

2751

"details": [ # A list of messages that carry the error details. There is a common set of

2752

# message types for APIs to use.

2753

{

2754

"a_key": "", # Properties of the object. Contains field @type with type URL.

},

],

},

},

],

"name": "A String", # The server-assigned name.

2761

"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.

2762

"requestedOptions": { # The configuration used for this job.

2763

"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of

2764

# this run.

2765

# to be detected) to be used anywhere you otherwise would normally specify

2766

# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates

2767

# to learn more.

2768

"updateTime": "A String", # The last update timestamp of a inspectTemplate, output only field.

2769

"displayName": "A String", # Display name (max 256 chars).

2770

"description": "A String", # Short description (max 256 chars).

2771

"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.

2772

# When used with redactContent only info_types and min_likelihood are currently

2773

# used.

2774

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

2775

"limits": {

2776

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

2777

# When set within `InspectContentRequest`, the maximum returned is 2000

2778

# regardless if this is set higher.

2779

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

2780

{ # Max findings configuration per infoType, per content item or long

2781

# running DlpJob.

2782

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

2783

# info_type should be provided. If InfoTypeLimit does not have an

2784

# info_type, the DLP API applies the limit against all info_types that

2785

# are found but not specified in another InfoTypeLimit.

2786

"name": "A String", # Name of the information type. Either a name of your choosing when

2787

# creating a CustomInfoType, or one of the names listed

2788

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2789

# a built-in type. InfoType names should conform to the pattern

2790

# [a-zA-Z0-9_]{1,64}.

2791

},

2792

"maxFindings": 42, # Max findings limit for the given infoType.

2793

},

2794

],

2795

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

2796

# When set within `InspectDataSourceRequest`,

2797

# the maximum returned is 2000 regardless if this is set higher.

2798

# When set within `InspectContentRequest`, this field is ignored.

2799

},

2800

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

2801

# POSSIBLE.

2802

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

2803

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

2804

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

2805

{ # Custom information type provided by the user. Used to find domain-specific

2806

# sensitive information configurable to the data in question.

2807

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

2808

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2809

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2810

# google/re2 repository on GitHub.

2811

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2812

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

2817

# support reversing.

2818

# such as

2819

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

2820

# These types of transformations are

2821

# those that perform pseudonymization, thereby producing a "surrogate" as

2822

# output. This should be used in conjunction with a field on the

2823

# transformation such as `surrogate_info_type`. This CustomInfoType does

2824

# not support the use of `detection_rules`.

2825

},

2826

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

2827

# infoType, when the name matches one of existing infoTypes and that infoType

2828

# is specified in `InspectContent.info_types` field. Specifying the latter

2829

# adds findings to the one detected by the system. If built-in info type is

2830

# not specified in `InspectContent.info_types` list then the name is treated

2831

# as a custom info type.

2832

"name": "A String", # Name of the information type. Either a name of your choosing when

2833

# creating a CustomInfoType, or one of the names listed

2834

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

2835

# a built-in type. InfoType names should conform to the pattern

2836

# [a-zA-Z0-9_]{1,64}.

2837

},

2838

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

2839

# be used to match sensitive information specific to the data, such as a list

2840

# of employee IDs or job titles.

2841

#

2842

# Dictionary words are case-insensitive and all characters other than letters

2843

# and digits in the unicode [Basic Multilingual

2844

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

2845

# will be replaced with whitespace when scanning for matches, so the

2846

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

2847

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

2848

# surrounding any match must be of a different type than the adjacent

2849

# characters within the word, so letters must be next to non-letters and

2850

# digits next to non-digits. For example, the dictionary word "jen" will

2851

# match the first three letters of the text "jen123" but will return no

2852

# matches for "jennifer".

2853

#

2854

# Dictionary words containing a large number of characters that are not

2855

# letters or digits may result in unexpected findings because such characters

2856

# are treated as whitespace. The

2857

# [limits](https://cloud.google.com/dlp/limits) page contains details about

2858

# the size limits of dictionaries. For dictionaries that do not fit within

2859

# these constraints, consider using `LargeCustomDictionaryConfig` in the

2860

# `StoredInfoType` API.

2861

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

2862

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

2863

# at least one phrase and every phrase must contain at least 2 characters

2864

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

2869

# is accepted.

2870

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

2871

# Example: gs://[BUCKET_NAME]/dictionary.txt

2872

},

2873

},

2874

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

2875

# `InspectDataSource`. Not currently supported in `InspectContent`.

2876

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

2877

# `organizations/433245324/storedInfoTypes/432452342` or

2878

# `projects/project-id/storedInfoTypes/432452342`.

2879

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

2880

# inspection was created. Output-only field, populated by the system.

2881

},

2882

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

2883

# Rules are applied in order that they are specified. Not supported for the

2884

# `surrogate_type` CustomInfoType.

2885

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

2886

# `CustomInfoType` to alter behavior under certain circumstances, depending

2887

# on the specific details of the rule. Not supported for the `surrogate_type`

2888

# custom infoType.

2889

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

2890

# proximity of hotwords.

2891

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

2892

# The total length of the window cannot exceed 1000 characters. Note that

2893

# the finding itself will be included in the window, so that hotwords may

2894

# be used to match substrings of the finding itself. For example, the

2895

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

2896

# adjusted upwards if the area code is known to be the local area code of

2897

# a company office using the hotword regex "\(xxx\)", where "xxx"

2898

# is the area code in question.

2899

# rule.

2900

"windowAfter": 42, # Number of characters after the finding to consider.

2901

"windowBefore": 42, # Number of characters before the finding to consider.

2902

},

2903

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

2904

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2905

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2906

# google/re2 repository on GitHub.

2907

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2908

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

2913

# part of a detection rule.

2914

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

2915

# levels. For example, if a finding would be `POSSIBLE` without the

2916

# detection rule and `relative_likelihood` is 1, then it is upgraded to

2917

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

2918

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

2919

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

2920

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

2921

# a final likelihood of `LIKELY`.

2922

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

2928

# to be returned. It still can be used for rules matching.

2929

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

2930

# altered by a detection rule if the finding meets the criteria specified by

2931

# the rule. Defaults to `VERY_LIKELY` if not specified.

2932

},

2933

],

2934

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

2935

# included in the response; see Finding.quote.

2936

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

2937

# Exclusion rules, contained in the set are executed in the end, other

2938

# rules are executed in the order they are specified for each info type.

2939

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

2940

# circumstances, depending on the specific details of the rules within the set.

2941

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

2942

{ # A single inspection rule to be applied to infoTypes, specified in

2943

# `InspectionRuleSet`.

2944

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

2945

# proximity of hotwords.

2946

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

2947

# The total length of the window cannot exceed 1000 characters. Note that

2948

# the finding itself will be included in the window, so that hotwords may

2949

# be used to match substrings of the finding itself. For example, the

2950

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

2951

# adjusted upwards if the area code is known to be the local area code of

2952

# a company office using the hotword regex "\(xxx\)", where "xxx"

2953

# is the area code in question.

2954

# rule.

2955

"windowAfter": 42, # Number of characters after the finding to consider.

2956

"windowBefore": 42, # Number of characters before the finding to consider.

2957

},

2958

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

2959

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2960

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2961

# google/re2 repository on GitHub.

2962

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2963

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

2968

# part of a detection rule.

2969

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

2970

# levels. For example, if a finding would be `POSSIBLE` without the

2971

# detection rule and `relative_likelihood` is 1, then it is upgraded to

2972

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

2973

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

2974

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

2975

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

2976

# a final likelihood of `LIKELY`.

2977

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

2978

},

2979

},

2980

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

2981

# `InspectionRuleSet` are removed from results.

2982

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

2983

"pattern": "A String", # Pattern defining the regular expression. Its syntax

2984

# (https://github.com/google/re2/wiki/Syntax) can be found under the

2985

# google/re2 repository on GitHub.

2986

"groupIndexes": [ # The index of the submatch to extract as findings. When not

2987

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

2992

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

2993

# contained within with a finding of an infoType from this list. For

2994

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

2995

# `exclusion_rule` containing `exclude_info_types.info_types` with

2996

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

2997

# with EMAIL_ADDRESS finding.

2998

# That leads to "555-222-2222@example.org" to generate only a single

2999

# finding, namely email address.

3000

{ # Type of information detected by the API.

3001

"name": "A String", # Name of the information type. Either a name of your choosing when

3002

# creating a CustomInfoType, or one of the names listed

3003

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3004

# a built-in type. InfoType names should conform to the pattern

3005

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

3010

# be used to match sensitive information specific to the data, such as a list

3011

# of employee IDs or job titles.

3012

#

3013

# Dictionary words are case-insensitive and all characters other than letters

3014

# and digits in the unicode [Basic Multilingual

3015

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

3016

# will be replaced with whitespace when scanning for matches, so the

3017

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

3018

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

3019

# surrounding any match must be of a different type than the adjacent

3020

# characters within the word, so letters must be next to non-letters and

3021

# digits next to non-digits. For example, the dictionary word "jen" will

3022

# match the first three letters of the text "jen123" but will return no

3023

# matches for "jennifer".

3024

#

3025

# Dictionary words containing a large number of characters that are not

3026

# letters or digits may result in unexpected findings because such characters

3027

# are treated as whitespace. The

3028

# [limits](https://cloud.google.com/dlp/limits) page contains details about

3029

# the size limits of dictionaries. For dictionaries that do not fit within

3030

# these constraints, consider using `LargeCustomDictionaryConfig` in the

3031

# `StoredInfoType` API.

3032

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

3033

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

3034

# at least one phrase and every phrase must contain at least 2 characters

3035

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

3040

# is accepted.

3041

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

3042

# Example: gs://[BUCKET_NAME]/dictionary.txt

3043

},

3044

},

3045

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

3050

{ # Type of information detected by the API.

3051

"name": "A String", # Name of the information type. Either a name of your choosing when

3052

# creating a CustomInfoType, or one of the names listed

3053

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3054

# a built-in type. InfoType names should conform to the pattern

3055

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

3061

# If empty, text, images, and other content will be included.

3062

"A String",

3063

],

3064

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

3065

# InfoType values returned by ListInfoTypes or listed at

3066

# https://cloud.google.com/dlp/docs/infotypes-reference.

3067

#

3068

# When no InfoTypes or CustomInfoTypes are specified in a request, the

3069

# system may automatically choose what detectors to run. By default this may

3070

# be all types, but may change over time as detectors are updated.

3071

#

3072

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

3073

# but may change over time as new InfoTypes are added. If you need precise

3074

# control and predictability as to what detectors are run you should specify

3075

# specific InfoTypes listed in the reference.

3076

{ # Type of information detected by the API.

3077

"name": "A String", # Name of the information type. Either a name of your choosing when

3078

# creating a CustomInfoType, or one of the names listed

3079

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3080

# a built-in type. InfoType names should conform to the pattern

3081

# [a-zA-Z0-9_]{1,64}.

},

],

},

"createTime": "A String", # The creation timestamp of a inspectTemplate, output only field.

3086

"name": "A String", # The template name. Output only.

3087

#

3088

# The template will have one of the following formats:

3089

# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR

3090

# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`

3091

},

3092

"jobConfig": {

3093

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

3094

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.

3095

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

3096

# by project and namespace, however the namespace ID may be empty.

3097

# A partition ID identifies a grouping of entities. The grouping is always

3098

# by project and namespace, however the namespace ID may be empty.

3099

#

3100

# A partition ID contains several dimensions:

3101

# project ID and namespace ID.

3102

"projectId": "A String", # The ID of the project to which the entities belong.

3103

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

3104

},

3105

"kind": { # A representation of a Datastore kind. # The kind to process.

3106

"name": "A String", # The name of the kind.

3107

},

3108

},

3109

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.

3110

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

3111

# inspection of entire columns which you know have no findings.

3112

{ # General identifier of a data field in a storage service.

3113

"name": "A String", # Name describing the field.

3114

},

3115

],

3116

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

3117

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

3118

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

3119

# Cannot be used in conjunction with TimespanConfig.

3120

"sampleMethod": "A String",

3121

"identifyingFields": [ # References to fields uniquely identifying rows within the table.

3122

# Nested fields in the format, like `person.birthdate.year`, are allowed.

3123

{ # General identifier of a data field in a storage service.

3124

"name": "A String", # Name describing the field.

3125

},

3126

],

3127

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

3128

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

3129

# 100 means no limit. Defaults to 0. Only one of rows_limit and

3130

# rows_limit_percent can be specified. Cannot be used in conjunction with

3131

# TimespanConfig.

3132

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

3133

# identified by its project_id, dataset_id, and table_name. Within a query

3134

# a table is often referenced with a string in the format of:

3135

# `<project_id>:<dataset_id>.<table_id>` or

3136

# `<project_id>.<dataset_id>.<table_id>`.

3137

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3138

# If omitted, project ID is inferred from the API call.

3139

"tableId": "A String", # Name of the table.

3140

"datasetId": "A String", # Dataset ID of the table.

3141

},

3142

},

3143

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

3144

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

3145

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

3146

# Used for data sources like Datastore or BigQuery.

3147

# If not specified for BigQuery, table last modification timestamp

3148

# is checked against given time span.

3149

# The valid data types of the timestamp field are:

3150

# for BigQuery - timestamp, date, datetime;

3151

# for Datastore - timestamp.

3152

# Datastore entity will be scanned if the timestamp property does not exist

3153

# or its value is empty or invalid.

3154

"name": "A String", # Name describing the field.

3155

},

3156

"endTime": "A String", # Exclude files or rows newer than this value.

3157

# If set to zero, no upper time limit is applied.

3158

"startTime": "A String", # Exclude files or rows older than this value.

3159

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

3160

# a valid start_time to avoid scanning files that have not been modified

3161

# since the last time the JobTrigger executed. This will be based on the

3162

# time of the execution of the last run of the JobTrigger.

3163

},

3164

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.

3165

# bucket.

3166

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

3167

# than this value then the rest of the bytes are omitted. Only one

3168

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

3169

"sampleMethod": "A String",

3170

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

3171

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

3172

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

3173

#

3174

# If the url ends in a trailing slash, the bucket or directory represented

3175

# by the url will be scanned non-recursively (content in sub-directories

3176

# will not be scanned). This means that `gs://mybucket/` is equivalent to

3177

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

3178

# `gs://mybucket/directory/*`.

3179

#

3180

# Exactly one of `url` or `regex_file_set` must be set.

3181

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

3182

# `regex_file_set` must be set.

3183

# expressions are used to allow fine-grained control over which files in the

3184

# bucket to include.

3185

#

3186

# Included files are those that match at least one item in `include_regex` and

3187

# do not match any items in `exclude_regex`. Note that a file that matches

3188

# items from both lists will _not_ be included. For a match to occur, the

3189

# entire file path (i.e., everything in the url after the bucket name) must

3190

# match the regular expression.

3191

#

3192

# For example, given the input `{bucket_name: "mybucket", include_regex:

3193

# ["directory1/.*"], exclude_regex:

3194

# ["directory1/excluded.*"]}`:

3195

#

3196

# * `gs://mybucket/directory1/myfile` will be included

3197

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

3198

# across `/`)

3199

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

3200

# full path doesn't match any items in `include_regex`)

3201

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

3202

# matches an item in `exclude_regex`)

3203

#

3204

# If `include_regex` is left empty, it will match all files by default

3205

# (this is equivalent to setting `include_regex: [".*"]`).

3206

#

3207

# Some other common use cases:

3208

#

3209

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

3210

# files in `mybucket` except for .pdf files

3211

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

3212

# include all files directly under `gs://mybucket/directory/`, without matching

3213

# across `/`

3214

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

3215

# the bucket that match at least one of these regular expressions will be

3216

# excluded from the scan.

3217

#

3218

# Regular expressions use RE2

3219

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

3220

# under the google/re2 repository on GitHub.

3221

"A String",

3222

],

3223

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

3224

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

3225

# the bucket that match at least one of these regular expressions will be

3226

# included in the set of files, except for those that also match an item in

3227

# `exclude_regex`. Leaving this field empty will match all files by default

3228

# (this is equivalent to including `.*` in the list).

3229

#

3230

# Regular expressions use RE2

3231

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

3232

# under the google/re2 repository on GitHub.

"A String",

],

},

},

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

3238

# number of bytes scanned is rounded down. Must be between 0 and 100,

3239

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

3240

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

3241

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

3242

# Number of files scanned is rounded down. Must be between 0 and 100,

3243

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

3244

"fileTypes": [ # List of file type groups to include in the scan.

3245

# If empty, all files are scanned and available data format processors

3246

# are applied. In addition, the binary content of the selected files

3247

# is always scanned as well.

"A String",

],

},

},

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

3253

# When used with redactContent only info_types and min_likelihood are currently

3254

# used.

3255

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

3256

"limits": {

3257

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

3258

# When set within `InspectContentRequest`, the maximum returned is 2000

3259

# regardless if this is set higher.

3260

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

3261

{ # Max findings configuration per infoType, per content item or long

3262

# running DlpJob.

3263

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

3264

# info_type should be provided. If InfoTypeLimit does not have an

3265

# info_type, the DLP API applies the limit against all info_types that

3266

# are found but not specified in another InfoTypeLimit.

3267

"name": "A String", # Name of the information type. Either a name of your choosing when

3268

# creating a CustomInfoType, or one of the names listed

3269

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3270

# a built-in type. InfoType names should conform to the pattern

3271

# [a-zA-Z0-9_]{1,64}.

3272

},

3273

"maxFindings": 42, # Max findings limit for the given infoType.

3274

},

3275

],

3276

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

3277

# When set within `InspectDataSourceRequest`,

3278

# the maximum returned is 2000 regardless if this is set higher.

3279

# When set within `InspectContentRequest`, this field is ignored.

3280

},

3281

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

3282

# POSSIBLE.

3283

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

3284

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

3285

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

3286

{ # Custom information type provided by the user. Used to find domain-specific

3287

# sensitive information configurable to the data in question.

3288

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

3289

"pattern": "A String", # Pattern defining the regular expression. Its syntax

3290

# (https://github.com/google/re2/wiki/Syntax) can be found under the

3291

# google/re2 repository on GitHub.

3292

"groupIndexes": [ # The index of the submatch to extract as findings. When not

3293

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

3298

# support reversing.

3299

# such as

3300

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

3301

# These types of transformations are

3302

# those that perform pseudonymization, thereby producing a "surrogate" as

3303

# output. This should be used in conjunction with a field on the

3304

# transformation such as `surrogate_info_type`. This CustomInfoType does

3305

# not support the use of `detection_rules`.

3306

},

3307

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

3308

# infoType, when the name matches one of existing infoTypes and that infoType

3309

# is specified in `InspectContent.info_types` field. Specifying the latter

3310

# adds findings to the one detected by the system. If built-in info type is

3311

# not specified in `InspectContent.info_types` list then the name is treated

3312

# as a custom info type.

3313

"name": "A String", # Name of the information type. Either a name of your choosing when

3314

# creating a CustomInfoType, or one of the names listed

3315

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3316

# a built-in type. InfoType names should conform to the pattern

3317

# [a-zA-Z0-9_]{1,64}.

3318

},

3319

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

3320

# be used to match sensitive information specific to the data, such as a list

3321

# of employee IDs or job titles.

3322

#

3323

# Dictionary words are case-insensitive and all characters other than letters

3324

# and digits in the unicode [Basic Multilingual

3325

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

3326

# will be replaced with whitespace when scanning for matches, so the

3327

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

3328

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

3329

# surrounding any match must be of a different type than the adjacent

3330

# characters within the word, so letters must be next to non-letters and

3331

# digits next to non-digits. For example, the dictionary word "jen" will

3332

# match the first three letters of the text "jen123" but will return no

3333

# matches for "jennifer".

3334

#

3335

# Dictionary words containing a large number of characters that are not

3336

# letters or digits may result in unexpected findings because such characters

3337

# are treated as whitespace. The

3338

# [limits](https://cloud.google.com/dlp/limits) page contains details about

3339

# the size limits of dictionaries. For dictionaries that do not fit within

3340

# these constraints, consider using `LargeCustomDictionaryConfig` in the

3341

# `StoredInfoType` API.

3342

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

3343

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

3344

# at least one phrase and every phrase must contain at least 2 characters

3345

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

3350

# is accepted.

3351

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

3352

# Example: gs://[BUCKET_NAME]/dictionary.txt

3353

},

3354

},

3355

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

3356

# `InspectDataSource`. Not currently supported in `InspectContent`.

3357

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

3358

# `organizations/433245324/storedInfoTypes/432452342` or

3359

# `projects/project-id/storedInfoTypes/432452342`.

3360

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

3361

# inspection was created. Output-only field, populated by the system.

3362

},

3363

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

3364

# Rules are applied in order that they are specified. Not supported for the

3365

# `surrogate_type` CustomInfoType.

3366

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

3367

# `CustomInfoType` to alter behavior under certain circumstances, depending

3368

# on the specific details of the rule. Not supported for the `surrogate_type`

3369

# custom infoType.

3370

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

3371

# proximity of hotwords.

3372

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

3373

# The total length of the window cannot exceed 1000 characters. Note that

3374

# the finding itself will be included in the window, so that hotwords may

3375

# be used to match substrings of the finding itself. For example, the

3376

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

3377

# adjusted upwards if the area code is known to be the local area code of

3378

# a company office using the hotword regex "\(xxx\)", where "xxx"

3379

# is the area code in question.

3380

# rule.

3381

"windowAfter": 42, # Number of characters after the finding to consider.

3382

"windowBefore": 42, # Number of characters before the finding to consider.

3383

},

3384

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

3385

"pattern": "A String", # Pattern defining the regular expression. Its syntax

3386

# (https://github.com/google/re2/wiki/Syntax) can be found under the

3387

# google/re2 repository on GitHub.

3388

"groupIndexes": [ # The index of the submatch to extract as findings. When not

3389

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

3394

# part of a detection rule.

3395

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

3396

# levels. For example, if a finding would be `POSSIBLE` without the

3397

# detection rule and `relative_likelihood` is 1, then it is upgraded to

3398

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

3399

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

3400

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

3401

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

3402

# a final likelihood of `LIKELY`.

3403

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

3409

# to be returned. It still can be used for rules matching.

3410

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

3411

# altered by a detection rule if the finding meets the criteria specified by

3412

# the rule. Defaults to `VERY_LIKELY` if not specified.

3413

},

3414

],

3415

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

3416

# included in the response; see Finding.quote.

3417

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

3418

# Exclusion rules, contained in the set are executed in the end, other

3419

# rules are executed in the order they are specified for each info type.

3420

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

3421

# circumstances, depending on the specific details of the rules within the set.

3422

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

3423

{ # A single inspection rule to be applied to infoTypes, specified in

3424

# `InspectionRuleSet`.

3425

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

3426

# proximity of hotwords.

3427

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

3428

# The total length of the window cannot exceed 1000 characters. Note that

3429

# the finding itself will be included in the window, so that hotwords may

3430

# be used to match substrings of the finding itself. For example, the

3431

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

3432

# adjusted upwards if the area code is known to be the local area code of

3433

# a company office using the hotword regex "\(xxx\)", where "xxx"

3434

# is the area code in question.

3435

# rule.

3436

"windowAfter": 42, # Number of characters after the finding to consider.

3437

"windowBefore": 42, # Number of characters before the finding to consider.

3438

},

3439

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

3440

"pattern": "A String", # Pattern defining the regular expression. Its syntax

3441

# (https://github.com/google/re2/wiki/Syntax) can be found under the

3442

# google/re2 repository on GitHub.

3443

"groupIndexes": [ # The index of the submatch to extract as findings. When not

3444

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

3449

# part of a detection rule.

3450

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

3451

# levels. For example, if a finding would be `POSSIBLE` without the

3452

# detection rule and `relative_likelihood` is 1, then it is upgraded to

3453

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

3454

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

3455

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

3456

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

3457

# a final likelihood of `LIKELY`.

3458

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

3459

},

3460

},

3461

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

3462

# `InspectionRuleSet` are removed from results.

3463

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

3464

"pattern": "A String", # Pattern defining the regular expression. Its syntax

3465

# (https://github.com/google/re2/wiki/Syntax) can be found under the

3466

# google/re2 repository on GitHub.

3467

"groupIndexes": [ # The index of the submatch to extract as findings. When not

3468

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

3473

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

3474

# contained within with a finding of an infoType from this list. For

3475

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

3476

# `exclusion_rule` containing `exclude_info_types.info_types` with

3477

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

3478

# with EMAIL_ADDRESS finding.

3479

# That leads to "555-222-2222@example.org" to generate only a single

3480

# finding, namely email address.

3481

{ # Type of information detected by the API.

3482

"name": "A String", # Name of the information type. Either a name of your choosing when

3483

# creating a CustomInfoType, or one of the names listed

3484

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3485

# a built-in type. InfoType names should conform to the pattern

3486

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

3491

# be used to match sensitive information specific to the data, such as a list

3492

# of employee IDs or job titles.

3493

#

3494

# Dictionary words are case-insensitive and all characters other than letters

3495

# and digits in the unicode [Basic Multilingual

3496

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

3497

# will be replaced with whitespace when scanning for matches, so the

3498

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

3499

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

3500

# surrounding any match must be of a different type than the adjacent

3501

# characters within the word, so letters must be next to non-letters and

3502

# digits next to non-digits. For example, the dictionary word "jen" will

3503

# match the first three letters of the text "jen123" but will return no

3504

# matches for "jennifer".

3505

#

3506

# Dictionary words containing a large number of characters that are not

3507

# letters or digits may result in unexpected findings because such characters

3508

# are treated as whitespace. The

3509

# [limits](https://cloud.google.com/dlp/limits) page contains details about

3510

# the size limits of dictionaries. For dictionaries that do not fit within

3511

# these constraints, consider using `LargeCustomDictionaryConfig` in the

3512

# `StoredInfoType` API.

3513

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

3514

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

3515

# at least one phrase and every phrase must contain at least 2 characters

3516

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

3521

# is accepted.

3522

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

3523

# Example: gs://[BUCKET_NAME]/dictionary.txt

3524

},

3525

},

3526

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

3531

{ # Type of information detected by the API.

3532

"name": "A String", # Name of the information type. Either a name of your choosing when

3533

# creating a CustomInfoType, or one of the names listed

3534

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3535

# a built-in type. InfoType names should conform to the pattern

3536

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

3542

# If empty, text, images, and other content will be included.

3543

"A String",

3544

],

3545

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

3546

# InfoType values returned by ListInfoTypes or listed at

3547

# https://cloud.google.com/dlp/docs/infotypes-reference.

3548

#

3549

# When no InfoTypes or CustomInfoTypes are specified in a request, the

3550

# system may automatically choose what detectors to run. By default this may

3551

# be all types, but may change over time as detectors are updated.

3552

#

3553

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

3554

# but may change over time as new InfoTypes are added. If you need precise

3555

# control and predictability as to what detectors are run you should specify

3556

# specific InfoTypes listed in the reference.

3557

{ # Type of information detected by the API.

3558

"name": "A String", # Name of the information type. Either a name of your choosing when

3559

# creating a CustomInfoType, or one of the names listed

3560

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3561

# a built-in type. InfoType names should conform to the pattern

3562

# [a-zA-Z0-9_]{1,64}.

},

],

},

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

3567

# `inspect_config` will be merged into the values persisted as part of the

3568

# template.

3569

"actions": [ # Actions to execute at the completion of the job.

3570

{ # A task to execute on the completion of a job.

3571

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

3572

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

3573

# OutputStorageConfig. Only a single instance of this action can be

3574

# specified.

3575

# Compatible with: Inspect, Risk

3576

"outputConfig": { # Cloud repository for storing output.

3577

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

3578

# dataset. If table_id is not set a new one will be generated

3579

# for you with the following format:

3580

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

3581

# generating the date details.

3582

#

3583

# For Inspect, each column in an existing output table must have the same

3584

# name, type, and mode of a field in the `Finding` object.

3585

#

3586

# For Risk, an existing output table should be the output of a previous

3587

# Risk analysis job run on the same source table, with the same privacy

3588

# metric and quasi-identifiers. Risk jobs that analyze the same table but

3589

# compute a different privacy metric, or use different sets of

3590

# quasi-identifiers, cannot store their results in the same table.

3591

# identified by its project_id, dataset_id, and table_name. Within a query

3592

# a table is often referenced with a string in the format of:

3593

# `<project_id>:<dataset_id>.<table_id>` or

3594

# `<project_id>.<dataset_id>.<table_id>`.

3595

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

3596

# If omitted, project ID is inferred from the API call.

3597

"tableId": "A String", # Name of the table.

3598

"datasetId": "A String", # Dataset ID of the table.

3599

},

3600

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

3601

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

3602

# from the `Finding` object. If appending to an existing table, any columns

3603

# from the predefined schema that are missing will be added. No columns in

3604

# the existing table will be deleted.

3605

#

3606

# If unspecified, then all available columns will be used for a new table or

3607

# an (existing) table with no schema, and no changes will be made to an

3608

# existing table that has a schema.

3609

},

3610

},

3611

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's

3612

# completion/failure.

3613

# completion/failure.

3614

},

3615

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

3616

# Command Center (CSCC Alpha).

3617

# This action is only available for projects which are parts of

3618

# an organization and whitelisted for the alpha Cloud Security Command

3619

# Center.

3620

# The action will publish count of finding instances and their info types.

3621

# The summary of findings will be persisted in CSCC and are governed by CSCC

3622

# service-specific policy, see https://cloud.google.com/terms/service-terms

3623

# Only a single instance of this action can be specified.

3624

# Compatible with: Inspect

3625

},

3626

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

3627

# message contains a single field, `DlpJobName`, which is equal to the

3628

# finished job's

3629

# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

3630

# Compatible with: Inspect, Risk

3631

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

3632

# publishing access rights to the DLP API service account executing

3633

# the long running DlpJob sending the notifications.

3634

# Format is projects/{project}/topics/{topic}.

},

},

],

},

},

"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.

3641

"infoTypeStats": [ # Statistics of how many instances of each info type were found during

3642

# inspect job.

3643

{ # Statistics regarding a specific InfoType.

3644

"count": "A String", # Number of findings for this infoType.

3645

"infoType": { # Type of information detected by the API. # The type of finding this stat is for.

3646

"name": "A String", # Name of the information type. Either a name of your choosing when

3647

# creating a CustomInfoType, or one of the names listed

3648

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

3649

# a built-in type. InfoType names should conform to the pattern

3650

# [a-zA-Z0-9_]{1,64}.

},

},

],

"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.

3655

"processedBytes": "A String", # Total size in bytes that were processed.

3656

},

3657

},

3658

"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.

3659

"numericalStatsResult": { # Result of the numerical stats computation.

3660

"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal

3661

# sized buckets.

3662

{ # Set of primitive values supported by the system.

3663

# Note that for the purposes of inspection or transformation, the number

3664

# of bytes considered to comprise a 'Value' is based on its representation

3665

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3666

# 123456789, the number of bytes would be counted as 9, even though an

3667

# int64 only holds up to 8 bytes of data.

3668

"floatValue": 3.14,

3669

"timestampValue": "A String",

3670

"dayOfWeekValue": "A String",

3671

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3672

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3673

# types are google.type.Date and `google.protobuf.Timestamp`.

3674

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3675

# to allow the value "24:00:00" for scenarios like business closing time.

3676

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3677

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3678

# allow the value 60 if it allows leap-seconds.

3679

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3680

},

3681

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3682

# and time zone are either specified elsewhere or are not significant. The date

3683

# is relative to the Proleptic Gregorian Calendar. This can represent:

3684

#

3685

# * A full date, with non-zero year, month and day values

3686

# * A month and day value, with a zero year, e.g. an anniversary

3687

# * A year on its own, with zero month and day values

3688

# * A year and month value, with a zero day, e.g. a credit card expiration date

3689

#

3690

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3691

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3692

# a year.

3693

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3694

# if specifying a year by itself or a year and month where the day is not

3695

# significant.

3696

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3697

# month and day.

3698

},

3699

"stringValue": "A String",

3700

"booleanValue": True or False,

3701

"integerValue": "A String",

3702

},

3703

],

3704

"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.

3705

# Note that for the purposes of inspection or transformation, the number

3706

# of bytes considered to comprise a 'Value' is based on its representation

3707

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3708

# 123456789, the number of bytes would be counted as 9, even though an

3709

# int64 only holds up to 8 bytes of data.

3710

"floatValue": 3.14,

3711

"timestampValue": "A String",

3712

"dayOfWeekValue": "A String",

3713

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3714

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3715

# types are google.type.Date and `google.protobuf.Timestamp`.

3716

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3717

# to allow the value "24:00:00" for scenarios like business closing time.

3718

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3719

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3720

# allow the value 60 if it allows leap-seconds.

3721

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3722

},

3723

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3724

# and time zone are either specified elsewhere or are not significant. The date

3725

# is relative to the Proleptic Gregorian Calendar. This can represent:

3726

#

3727

# * A full date, with non-zero year, month and day values

3728

# * A month and day value, with a zero year, e.g. an anniversary

3729

# * A year on its own, with zero month and day values

3730

# * A year and month value, with a zero day, e.g. a credit card expiration date

3731

#

3732

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3733

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3734

# a year.

3735

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3736

# if specifying a year by itself or a year and month where the day is not

3737

# significant.

3738

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3739

# month and day.

3740

},

3741

"stringValue": "A String",

3742

"booleanValue": True or False,

3743

"integerValue": "A String",

3744

},

3745

"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.

3746

# Note that for the purposes of inspection or transformation, the number

3747

# of bytes considered to comprise a 'Value' is based on its representation

3748

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3749

# 123456789, the number of bytes would be counted as 9, even though an

3750

# int64 only holds up to 8 bytes of data.

3751

"floatValue": 3.14,

3752

"timestampValue": "A String",

3753

"dayOfWeekValue": "A String",

3754

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3755

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3756

# types are google.type.Date and `google.protobuf.Timestamp`.

3757

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3758

# to allow the value "24:00:00" for scenarios like business closing time.

3759

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3760

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3761

# allow the value 60 if it allows leap-seconds.

3762

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3763

},

3764

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3765

# and time zone are either specified elsewhere or are not significant. The date

3766

# is relative to the Proleptic Gregorian Calendar. This can represent:

3767

#

3768

# * A full date, with non-zero year, month and day values

3769

# * A month and day value, with a zero year, e.g. an anniversary

3770

# * A year on its own, with zero month and day values

3771

# * A year and month value, with a zero day, e.g. a credit card expiration date

3772

#

3773

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3774

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3775

# a year.

3776

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3777

# if specifying a year by itself or a year and month where the day is not

3778

# significant.

3779

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3780

# month and day.

3781

},

3782

"stringValue": "A String",

3783

"booleanValue": True or False,

3784

"integerValue": "A String",

3785

},

3786

},

3787

"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an

3788

# estimation, not exact values.

3789

"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value

3790

# doesn't correspond to any such interval, the associated frequency is

3791

# zero. For example, the following records:

3792

# {min_anonymity: 1, max_anonymity: 1, frequency: 17}

3793

# {min_anonymity: 2, max_anonymity: 3, frequency: 42}

3794

# {min_anonymity: 5, max_anonymity: 10, frequency: 99}

3795

# mean that there are no record with an estimated anonymity of 4, 5, or

3796

# larger than 10.

3797

{ # A KMapEstimationHistogramBucket message with the following values:

# min_anonymity: 3

# max_anonymity: 5

# frequency: 42

# means that there are 42 records whose quasi-identifier values correspond

3802

# to 3, 4 or 5 people in the overlying population. An important particular

3803

# case is when min_anonymity = max_anonymity = 1: the frequency field then

3804

# corresponds to the number of uniquely identifiable records.

3805

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

3806

# number of classes returned per bucket is capped at 20.

3807

{ # A tuple of values for the quasi-identifier columns.

3808

"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.

3809

"quasiIdsValues": [ # The quasi-identifier values.

3810

{ # Set of primitive values supported by the system.

3811

# Note that for the purposes of inspection or transformation, the number

3812

# of bytes considered to comprise a 'Value' is based on its representation

3813

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3814

# 123456789, the number of bytes would be counted as 9, even though an

3815

# int64 only holds up to 8 bytes of data.

3816

"floatValue": 3.14,

3817

"timestampValue": "A String",

3818

"dayOfWeekValue": "A String",

3819

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3820

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3821

# types are google.type.Date and `google.protobuf.Timestamp`.

3822

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3823

# to allow the value "24:00:00" for scenarios like business closing time.

3824

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3825

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3826

# allow the value 60 if it allows leap-seconds.

3827

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3828

},

3829

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3830

# and time zone are either specified elsewhere or are not significant. The date

3831

# is relative to the Proleptic Gregorian Calendar. This can represent:

3832

#

3833

# * A full date, with non-zero year, month and day values

3834

# * A month and day value, with a zero year, e.g. an anniversary

3835

# * A year on its own, with zero month and day values

3836

# * A year and month value, with a zero day, e.g. a credit card expiration date

3837

#

3838

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3839

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3840

# a year.

3841

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3842

# if specifying a year by itself or a year and month where the day is not

3843

# significant.

3844

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3845

# month and day.

3846

},

3847

"stringValue": "A String",

3848

"booleanValue": True or False,

3849

"integerValue": "A String",

},

],

},

],

"minAnonymity": "A String", # Always positive.

3855

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

3856

"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.

3857

"bucketSize": "A String", # Number of records within these anonymity bounds.

},

],

},

"kAnonymityResult": { # Result of the k-anonymity computation.

3862

"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.

3863

{

3864

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

3865

# classes returned per bucket is capped at 20.

3866

{ # The set of columns' values that share the same ldiversity value

3867

"quasiIdsValues": [ # Set of values defining the equivalence class. One value per

3868

# quasi-identifier column in the original KAnonymity metric message.

3869

# The order is always the same as the original request.

3870

{ # Set of primitive values supported by the system.

3871

# Note that for the purposes of inspection or transformation, the number

3872

# of bytes considered to comprise a 'Value' is based on its representation

3873

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3874

# 123456789, the number of bytes would be counted as 9, even though an

3875

# int64 only holds up to 8 bytes of data.

3876

"floatValue": 3.14,

3877

"timestampValue": "A String",

3878

"dayOfWeekValue": "A String",

3879

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3880

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3881

# types are google.type.Date and `google.protobuf.Timestamp`.

3882

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3883

# to allow the value "24:00:00" for scenarios like business closing time.

3884

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3885

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3886

# allow the value 60 if it allows leap-seconds.

3887

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3888

},

3889

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3890

# and time zone are either specified elsewhere or are not significant. The date

3891

# is relative to the Proleptic Gregorian Calendar. This can represent:

3892

#

3893

# * A full date, with non-zero year, month and day values

3894

# * A month and day value, with a zero year, e.g. an anniversary

3895

# * A year on its own, with zero month and day values

3896

# * A year and month value, with a zero day, e.g. a credit card expiration date

3897

#

3898

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3899

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3900

# a year.

3901

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3902

# if specifying a year by itself or a year and month where the day is not

3903

# significant.

3904

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3905

# month and day.

3906

},

3907

"stringValue": "A String",

3908

"booleanValue": True or False,

3909

"integerValue": "A String",

3910

},

3911

],

3912

"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the

3913

# above set of values.

3914

},

3915

],

3916

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

3917

"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.

3918

"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.

3919

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

},

],

},

"lDiversityResult": { # Result of the l-diversity computation.

3924

"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.

3925

{

3926

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

3927

# classes returned per bucket is capped at 20.

3928

{ # The set of columns' values that share the same ldiversity value.

3929

"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.

3930

"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence

3931

# class. The order is always the same as the original request.

3932

{ # Set of primitive values supported by the system.

3933

# Note that for the purposes of inspection or transformation, the number

3934

# of bytes considered to comprise a 'Value' is based on its representation

3935

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3936

# 123456789, the number of bytes would be counted as 9, even though an

3937

# int64 only holds up to 8 bytes of data.

3938

"floatValue": 3.14,

3939

"timestampValue": "A String",

3940

"dayOfWeekValue": "A String",

3941

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3942

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3943

# types are google.type.Date and `google.protobuf.Timestamp`.

3944

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3945

# to allow the value "24:00:00" for scenarios like business closing time.

3946

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3947

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3948

# allow the value 60 if it allows leap-seconds.

3949

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3950

},

3951

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3952

# and time zone are either specified elsewhere or are not significant. The date

3953

# is relative to the Proleptic Gregorian Calendar. This can represent:

3954

#

3955

# * A full date, with non-zero year, month and day values

3956

# * A month and day value, with a zero year, e.g. an anniversary

3957

# * A year on its own, with zero month and day values

3958

# * A year and month value, with a zero day, e.g. a credit card expiration date

3959

#

3960

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

3961

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

3962

# a year.

3963

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

3964

# if specifying a year by itself or a year and month where the day is not

3965

# significant.

3966

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

3967

# month and day.

3968

},

3969

"stringValue": "A String",

3970

"booleanValue": True or False,

3971

"integerValue": "A String",

3972

},

3973

],

3974

"topSensitiveValues": [ # Estimated frequencies of top sensitive values.

3975

{ # A value of a field, including its frequency.

3976

"count": "A String", # How many times the value is contained in the field.

3977

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

3978

# Note that for the purposes of inspection or transformation, the number

3979

# of bytes considered to comprise a 'Value' is based on its representation

3980

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

3981

# 123456789, the number of bytes would be counted as 9, even though an

3982

# int64 only holds up to 8 bytes of data.

3983

"floatValue": 3.14,

3984

"timestampValue": "A String",

3985

"dayOfWeekValue": "A String",

3986

"timeValue": { # Represents a time of day. The date and time zone are either not significant

3987

# or are specified elsewhere. An API may choose to allow leap seconds. Related

3988

# types are google.type.Date and `google.protobuf.Timestamp`.

3989

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

3990

# to allow the value "24:00:00" for scenarios like business closing time.

3991

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

3992

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

3993

# allow the value 60 if it allows leap-seconds.

3994

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

3995

},

3996

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

3997

# and time zone are either specified elsewhere or are not significant. The date

3998

# is relative to the Proleptic Gregorian Calendar. This can represent:

3999

#

4000

# * A full date, with non-zero year, month and day values

4001

# * A month and day value, with a zero year, e.g. an anniversary

4002

# * A year on its own, with zero month and day values

4003

# * A year and month value, with a zero day, e.g. a credit card expiration date

4004

#

4005

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

4006

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

4007

# a year.

4008

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

4009

# if specifying a year by itself or a year and month where the day is not

4010

# significant.

4011

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

4012

# month and day.

4013

},

4014

"stringValue": "A String",

4015

"booleanValue": True or False,

4016

"integerValue": "A String",

},

},

],

"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.

4021

},

4022

],

4023

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

4024

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

4025

"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence

4026

# classes in this bucket.

4027

"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence

4028

# classes in this bucket.

},

],

},

"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

4033

"numericalStatsConfig": { # Compute numerical stats over an individual column, including

4034

# min, max, and quantiles.

4035

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

4036

# integer, float, date, datetime, timestamp, time.

4037

"name": "A String", # Name describing the field.

4038

},

4039

},

4040

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what

4041

# is called "journalist risk" in the literature, except the attack dataset is

4042

# statistically modeled instead of being perfectly known. This can be done

4043

# using publicly available data (like the US Census), or using a custom

4044

# statistical model (indicated as one or several BigQuery tables), or by

4045

# extrapolating from the distribution of values in the input dataset.

4046

# A column with a semantic tag attached.

4047

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

4048

# Required if no column is tagged with a region-specific InfoType (like

4049

# US_ZIP_5) or a region code.

4050

"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the

4051

# same tag. [required]

4052

{

4053

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

4054

"name": "A String", # Name describing the field.

4055

},

4056

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

4057

# indicate an auxiliary table that contains statistical information on

4058

# the possible values of this column (below).

4059

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

4060

# dataset as a statistical model of population, if available. We

4061

# currently support US ZIP codes, region codes, ages and genders.

4062

# To programmatically obtain the list of supported InfoTypes, use

4063

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

4064

"name": "A String", # Name of the information type. Either a name of your choosing when

4065

# creating a CustomInfoType, or one of the names listed

4066

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4067

# a built-in type. InfoType names should conform to the pattern

4068

# [a-zA-Z0-9_]{1,64}.

4069

},

4070

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

4071

# the distribution of values in the input data

4072

# empty messages in your APIs. A typical example is to use it as the request

4073

# or the response type of an API method. For instance:

4074

#

4075

# service Foo {

4076

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

4077

# }

4078

#

4079

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

4084

# used to tag a quasi-identifiers column must appear in exactly one column

4085

# of one auxiliary table.

4086

{ # An auxiliary table contains statistical information on the relative

4087

# frequency of different quasi-identifiers values. It has one or several

4088

# quasi-identifiers columns, and one column that indicates the relative

4089

# frequency of each quasi-identifier tuple.

4090

# If a tuple is present in the data but not in the auxiliary table, the

4091

# corresponding relative frequency is assumed to be zero (and thus, the

4092

# tuple is highly reidentifiable).

4093

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

4094

# between 0 and 1 (inclusive). Null values are assumed to be zero.

4095

# [required]

4096

"name": "A String", # Name describing the field.

4097

},

4098

"quasiIds": [ # Quasi-identifier columns. [required]

4099

{ # A quasi-identifier column has a custom_tag, used to know which column

4100

# in the data corresponds to which column in the statistical model.

4101

"field": { # General identifier of a data field in a storage service.

4102

"name": "A String", # Name describing the field.

4103

},

4104

"customTag": "A String",

4105

},

4106

],

4107

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

4108

# identified by its project_id, dataset_id, and table_name. Within a query

4109

# a table is often referenced with a string in the format of:

4110

# `<project_id>:<dataset_id>.<table_id>` or

4111

# `<project_id>.<dataset_id>.<table_id>`.

4112

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

4113

# If omitted, project ID is inferred from the API call.

4114

"tableId": "A String", # Name of the table.

4115

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.

4121

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

4122

"name": "A String", # Name describing the field.

4123

},

4124

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

4125

# defined for the l-diversity computation. When multiple fields are

4126

# specified, they are considered a single composite key.

4127

{ # General identifier of a data field in a storage service.

4128

"name": "A String", # Name describing the field.

},

],

},

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to

4133

# figure out that one given individual appears in a de-identified dataset.

4134

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

4135

# knowing the attack dataset, so we use a statistical model instead.

4136

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

4137

# Required if no column is tagged with a region-specific InfoType (like

4138

# US_ZIP_5) or a region code.

4139

"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the

4140

# same tag. [required]

4141

{ # A column with a semantic tag attached.

4142

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

4143

"name": "A String", # Name describing the field.

4144

},

4145

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

4146

# indicate an auxiliary table that contains statistical information on

4147

# the possible values of this column (below).

4148

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

4149

# dataset as a statistical model of population, if available. We

4150

# currently support US ZIP codes, region codes, ages and genders.

4151

# To programmatically obtain the list of supported InfoTypes, use

4152

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

4153

"name": "A String", # Name of the information type. Either a name of your choosing when

4154

# creating a CustomInfoType, or one of the names listed

4155

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4156

# a built-in type. InfoType names should conform to the pattern

4157

# [a-zA-Z0-9_]{1,64}.

4158

},

4159

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

4160

# the distribution of values in the input data

4161

# empty messages in your APIs. A typical example is to use it as the request

4162

# or the response type of an API method. For instance:

4163

#

4164

# service Foo {

4165

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

4166

# }

4167

#

4168

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

4173

# used to tag a quasi-identifiers field must appear in exactly one

4174

# field of one auxiliary table.

4175

{ # An auxiliary table containing statistical information on the relative

4176

# frequency of different quasi-identifiers values. It has one or several

4177

# quasi-identifiers columns, and one column that indicates the relative

4178

# frequency of each quasi-identifier tuple.

4179

# If a tuple is present in the data but not in the auxiliary table, the

4180

# corresponding relative frequency is assumed to be zero (and thus, the

4181

# tuple is highly reidentifiable).

4182

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

4183

# between 0 and 1 (inclusive). Null values are assumed to be zero.

4184

# [required]

4185

"name": "A String", # Name describing the field.

4186

},

4187

"quasiIds": [ # Quasi-identifier columns. [required]

4188

{ # A quasi-identifier column has a custom_tag, used to know which column

4189

# in the data corresponds to which column in the statistical model.

4190

"field": { # General identifier of a data field in a storage service.

4191

"name": "A String", # Name describing the field.

4192

},

4193

"customTag": "A String",

4194

},

4195

],

4196

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

4197

# identified by its project_id, dataset_id, and table_name. Within a query

4198

# a table is often referenced with a string in the format of:

4199

# `<project_id>:<dataset_id>.<table_id>` or

4200

# `<project_id>.<dataset_id>.<table_id>`.

4201

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

4202

# If omitted, project ID is inferred from the API call.

4203

"tableId": "A String", # Name of the table.

4204

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including

4210

# number of distinct values and value count distribution.

4211

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

4212

# supported except for arrays and structs. However, it may be more

4213

# informative to use NumericalStats when the field type is supported,

4214

# depending on the data.

4215

"name": "A String", # Name describing the field.

4216

},

4217

},

4218

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.

4219

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a

4220

# single individual. If the same entity_id is associated to multiple

4221

# quasi-identifier tuples over distinct rows, we consider the entire

4222

# collection of tuples as the composite quasi-identifier. This collection

4223

# is a multiset: the order in which the different tuples appear in the

4224

# dataset is ignored, but their frequency is taken into account.

4225

#

4226

# Important note: a maximum of 1000 rows can be associated to a single

4227

# entity ID. If more rows are associated with the same entity ID, some

4228

# might be ignored.

4229

# single person. For example, in medical records the `EntityId` might be a

4230

# patient identifier, or for financial records it might be an account

4231

# identifier. This message is used when generalizations or analysis must take

4232

# into account that multiple rows correspond to the same entity.

4233

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

4234

"name": "A String", # Name describing the field.

4235

},

4236

},

4237

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

4238

# specified, they are considered a single composite key. Structs and

4239

# repeated data types are not supported; however, nested fields are

4240

# supported so long as they are not structs themselves or nested within

4241

# a repeated field.

4242

{ # General identifier of a data field in a storage service.

4243

"name": "A String", # Name describing the field.

},

],

},

},

"categoricalStatsResult": { # Result of the categorical stats computation.

4249

"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.

4250

{

4251

"bucketValues": [ # Sample of value frequencies in this bucket. The total number of

4252

# values returned per bucket is capped at 20.

4253

{ # A value of a field, including its frequency.

4254

"count": "A String", # How many times the value is contained in the field.

4255

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

4256

# Note that for the purposes of inspection or transformation, the number

4257

# of bytes considered to comprise a 'Value' is based on its representation

4258

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

4259

# 123456789, the number of bytes would be counted as 9, even though an

4260

# int64 only holds up to 8 bytes of data.

4261

"floatValue": 3.14,

4262

"timestampValue": "A String",

4263

"dayOfWeekValue": "A String",

4264

"timeValue": { # Represents a time of day. The date and time zone are either not significant

4265

# or are specified elsewhere. An API may choose to allow leap seconds. Related

4266

# types are google.type.Date and `google.protobuf.Timestamp`.

4267

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

4268

# to allow the value "24:00:00" for scenarios like business closing time.

4269

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

4270

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

4271

# allow the value 60 if it allows leap-seconds.

4272

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

4273

},

4274

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

4275

# and time zone are either specified elsewhere or are not significant. The date

4276

# is relative to the Proleptic Gregorian Calendar. This can represent:

4277

#

4278

# * A full date, with non-zero year, month and day values

4279

# * A month and day value, with a zero year, e.g. an anniversary

4280

# * A year on its own, with zero month and day values

4281

# * A year and month value, with a zero day, e.g. a credit card expiration date

4282

#

4283

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

4284

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

4285

# a year.

4286

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

4287

# if specifying a year by itself or a year and month where the day is not

4288

# significant.

4289

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

4290

# month and day.

4291

},

4292

"stringValue": "A String",

4293

"booleanValue": True or False,

4294

"integerValue": "A String",

},

},

],

"bucketValueCount": "A String", # Total number of distinct values in this bucket.

4299

"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.

4300

"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.

4301

"bucketSize": "A String", # Total number of values in this bucket.

},

],

},

"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an

4306

# estimation, not exact values.

4307

"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a

4308

# value doesn't correspond to any such interval, the associated frequency

4309

# is zero. For example, the following records:

4310

# {min_probability: 0, max_probability: 0.1, frequency: 17}

4311

# {min_probability: 0.2, max_probability: 0.3, frequency: 42}

4312

# {min_probability: 0.3, max_probability: 0.4, frequency: 99}

4313

# mean that there are no record with an estimated probability in [0.1, 0.2)

4314

# nor larger or equal to 0.4.

4315

{ # A DeltaPresenceEstimationHistogramBucket message with the following

4316

# values:

4317

# min_probability: 0.1

4318

# max_probability: 0.2

4319

# frequency: 42

4320

# means that there are 42 records for which δ is in [0.1, 0.2). An

4321

# important particular case is when min_probability = max_probability = 1:

4322

# then, every individual who shares this quasi-identifier combination is in

4323

# the dataset.

4324

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

4325

# number of classes returned per bucket is capped at 20.

4326

{ # A tuple of values for the quasi-identifier columns.

4327

"quasiIdsValues": [ # The quasi-identifier values.

4328

{ # Set of primitive values supported by the system.

4329

# Note that for the purposes of inspection or transformation, the number

4330

# of bytes considered to comprise a 'Value' is based on its representation

4331

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

4332

# 123456789, the number of bytes would be counted as 9, even though an

4333

# int64 only holds up to 8 bytes of data.

4334

"floatValue": 3.14,

4335

"timestampValue": "A String",

4336

"dayOfWeekValue": "A String",

4337

"timeValue": { # Represents a time of day. The date and time zone are either not significant

4338

# or are specified elsewhere. An API may choose to allow leap seconds. Related

4339

# types are google.type.Date and `google.protobuf.Timestamp`.

4340

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

4341

# to allow the value "24:00:00" for scenarios like business closing time.

4342

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

4343

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

4344

# allow the value 60 if it allows leap-seconds.

4345

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

4346

},

4347

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

4348

# and time zone are either specified elsewhere or are not significant. The date

4349

# is relative to the Proleptic Gregorian Calendar. This can represent:

4350

#

4351

# * A full date, with non-zero year, month and day values

4352

# * A month and day value, with a zero year, e.g. an anniversary

4353

# * A year on its own, with zero month and day values

4354

# * A year and month value, with a zero day, e.g. a credit card expiration date

4355

#

4356

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

4357

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

4358

# a year.

4359

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

4360

# if specifying a year by itself or a year and month where the day is not

4361

# significant.

4362

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

4363

# month and day.

4364

},

4365

"stringValue": "A String",

4366

"booleanValue": True or False,

4367

"integerValue": "A String",

4368

},

4369

],

4370

"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these

4371

# quasi-identifier values is in the dataset. This value, typically called

4372

# δ, is the ratio between the number of records in the dataset with these

4373

# quasi-identifier values, and the total number of individuals (inside

4374

# *and* outside the dataset) with these quasi-identifier values.

4375

# For example, if there are 15 individuals in the dataset who share the

4376

# same quasi-identifier values, and an estimated 100 people in the entire

4377

# population with these values, then δ is 0.15.

4378

},

4379

],

4380

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

4381

"bucketSize": "A String", # Number of records within these probability bounds.

4382

"maxProbability": 3.14, # Always greater than or equal to min_probability.

4383

"minProbability": 3.14, # Between 0 and 1.

},

],

},

"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

4388

# identified by its project_id, dataset_id, and table_name. Within a query

4389

# a table is often referenced with a string in the format of:

4390

# `<project_id>:<dataset_id>.<table_id>` or

4391

# `<project_id>.<dataset_id>.<table_id>`.

4392

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

4393

# If omitted, project ID is inferred from the API call.

4394

"tableId": "A String", # Name of the table.

4395

"datasetId": "A String", # Dataset ID of the table.

4396

},

4397

},

4398

"state": "A String", # State of a job.

4399

"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that

4400

# instantiated the job.

4401

"startTime": "A String", # Time when the job started.

4402

"endTime": "A String", # Time when the job finished.

4403

"type": "A String", # The type of job.

4404

"createTime": "A String", # Time when the job was created.

}</pre>

</div>

<code class="details" id="list">list(parent, orderBy=None, type=None, pageSize=None, pageToken=None, x__xgafv=None, filter=None)</code>

4410

<pre>Lists DlpJobs that match the specified filter in the request.

4411

See https://cloud.google.com/dlp/docs/inspecting-storage and

4412

https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.

4413

4414

Args:

4415

parent: string, The parent resource name, for example projects/my-project-id. (required)

4416

orderBy: string, Optional comma separated list of fields to order by,

4417

followed by `asc` or `desc` postfix. This list is case-insensitive,

4418

default sorting order is ascending, redundant space characters are

4419

insignificant.

4420

4421

Example: `name asc, end_time asc, create_time desc`

4422

4423

Supported fields are:

4424

4425

- `create_time`: corresponds to time the job was created.

4426

- `end_time`: corresponds to time the job ended.

4427

- `name`: corresponds to job's name.

4428

- `state`: corresponds to `state`

4429

type: string, The type of job. Defaults to `DlpJobType.INSPECT`

4430

pageSize: integer, The standard list page size.

4431

pageToken: string, The standard list page token.

4432

x__xgafv: string, V1 error format.

Allowed values

1 - v1 error format

2 - v2 error format

filter: string, Optional. Allows filtering.

Supported syntax:

* Filter expressions are made up of one or more restrictions.

4441

* Restrictions can be combined by `AND` or `OR` logical operators. A

4442

sequence of restrictions implicitly uses `AND`.

4443

* A restriction has the form of `<field> <operator> <value>`.

4444

* Supported fields/values for inspect jobs:

4445

- `state` - PENDING|RUNNING|CANCELED|FINISHED|FAILED

4446

- `inspected_storage` - DATASTORE|CLOUD_STORAGE|BIGQUERY

4447

- `trigger_name` - The resource name of the trigger that created job.

4448

- 'end_time` - Corresponds to time the job finished.

4449

- 'start_time` - Corresponds to time the job finished.

4450

* Supported fields for risk analysis jobs:

4451

- `state` - RUNNING|CANCELED|FINISHED|FAILED

4452

- 'end_time` - Corresponds to time the job finished.

4453

- 'start_time` - Corresponds to time the job finished.

4454

* The operator must be `=` or `!=`.

Examples:

* inspected_storage = cloud_storage AND state = done

4459

* inspected_storage = cloud_storage OR inspected_storage = bigquery

4460

* inspected_storage = cloud_storage AND (state = done OR state = canceled)

4461

* end_time > \"2017-12-12T00:00:00+00:00\"

4462

4463

The length of this field should be no more than 500 characters.

4464

4465

Returns:

4466

An object of the form:

4467

4468

{ # The response message for listing DLP jobs.

4469

"nextPageToken": "A String", # The standard List next-page token.

4470

"jobs": [ # A list of DlpJobs that matches the specified filter in the request.

4471

{ # Combines all of the information about a DLP job.

4472

"errors": [ # A stream of errors encountered running the job.

4473

{ # Details information about an error encountered during job execution or

4474

# the results of an unsuccessful activation of the JobTrigger.

4475

# Output only field.

4476

"timestamps": [ # The times the error occurred.

4477

"A String",

4478

],

4479

"details": { # The `Status` type defines a logical error model that is suitable for

4480

# different programming environments, including REST APIs and RPC APIs. It is

4481

# used by [gRPC](https://github.com/grpc). Each `Status` message contains

4482

# three pieces of data: error code, error message, and error details.

4483

#

4484

# You can find out more about this error model and how to work with it in the

4485

# [API Design Guide](https://cloud.google.com/apis/design/errors).

4486

"message": "A String", # A developer-facing error message, which should be in English. Any

4487

# user-facing error message should be localized and sent in the

4488

# google.rpc.Status.details field, or localized by the client.

4489

"code": 42, # The status code, which should be an enum value of google.rpc.Code.

4490

"details": [ # A list of messages that carry the error details. There is a common set of

4491

# message types for APIs to use.

4492

{

4493

"a_key": "", # Properties of the object. Contains field @type with type URL.

},

],

},

},

],

"name": "A String", # The server-assigned name.

4500

"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.

4501

"requestedOptions": { # The configuration used for this job.

4502

"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of

4503

# this run.

4504

# to be detected) to be used anywhere you otherwise would normally specify

4505

# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates

4506

# to learn more.

4507

"updateTime": "A String", # The last update timestamp of a inspectTemplate, output only field.

4508

"displayName": "A String", # Display name (max 256 chars).

4509

"description": "A String", # Short description (max 256 chars).

4510

"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.

4511

# When used with redactContent only info_types and min_likelihood are currently

4512

# used.

4513

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

4514

"limits": {

4515

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

4516

# When set within `InspectContentRequest`, the maximum returned is 2000

4517

# regardless if this is set higher.

4518

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

4519

{ # Max findings configuration per infoType, per content item or long

4520

# running DlpJob.

4521

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

4522

# info_type should be provided. If InfoTypeLimit does not have an

4523

# info_type, the DLP API applies the limit against all info_types that

4524

# are found but not specified in another InfoTypeLimit.

4525

"name": "A String", # Name of the information type. Either a name of your choosing when

4526

# creating a CustomInfoType, or one of the names listed

4527

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4528

# a built-in type. InfoType names should conform to the pattern

4529

# [a-zA-Z0-9_]{1,64}.

4530

},

4531

"maxFindings": 42, # Max findings limit for the given infoType.

4532

},

4533

],

4534

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

4535

# When set within `InspectDataSourceRequest`,

4536

# the maximum returned is 2000 regardless if this is set higher.

4537

# When set within `InspectContentRequest`, this field is ignored.

4538

},

4539

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

4540

# POSSIBLE.

4541

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

4542

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

4543

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

4544

{ # Custom information type provided by the user. Used to find domain-specific

4545

# sensitive information configurable to the data in question.

4546

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

4547

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4548

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4549

# google/re2 repository on GitHub.

4550

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4551

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

4556

# support reversing.

4557

# such as

4558

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

4559

# These types of transformations are

4560

# those that perform pseudonymization, thereby producing a "surrogate" as

4561

# output. This should be used in conjunction with a field on the

4562

# transformation such as `surrogate_info_type`. This CustomInfoType does

4563

# not support the use of `detection_rules`.

4564

},

4565

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

4566

# infoType, when the name matches one of existing infoTypes and that infoType

4567

# is specified in `InspectContent.info_types` field. Specifying the latter

4568

# adds findings to the one detected by the system. If built-in info type is

4569

# not specified in `InspectContent.info_types` list then the name is treated

4570

# as a custom info type.

4571

"name": "A String", # Name of the information type. Either a name of your choosing when

4572

# creating a CustomInfoType, or one of the names listed

4573

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4574

# a built-in type. InfoType names should conform to the pattern

4575

# [a-zA-Z0-9_]{1,64}.

4576

},

4577

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

4578

# be used to match sensitive information specific to the data, such as a list

4579

# of employee IDs or job titles.

4580

#

4581

# Dictionary words are case-insensitive and all characters other than letters

4582

# and digits in the unicode [Basic Multilingual

4583

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

4584

# will be replaced with whitespace when scanning for matches, so the

4585

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

4586

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

4587

# surrounding any match must be of a different type than the adjacent

4588

# characters within the word, so letters must be next to non-letters and

4589

# digits next to non-digits. For example, the dictionary word "jen" will

4590

# match the first three letters of the text "jen123" but will return no

4591

# matches for "jennifer".

4592

#

4593

# Dictionary words containing a large number of characters that are not

4594

# letters or digits may result in unexpected findings because such characters

4595

# are treated as whitespace. The

4596

# [limits](https://cloud.google.com/dlp/limits) page contains details about

4597

# the size limits of dictionaries. For dictionaries that do not fit within

4598

# these constraints, consider using `LargeCustomDictionaryConfig` in the

4599

# `StoredInfoType` API.

4600

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

4601

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

4602

# at least one phrase and every phrase must contain at least 2 characters

4603

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

4608

# is accepted.

4609

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

4610

# Example: gs://[BUCKET_NAME]/dictionary.txt

4611

},

4612

},

4613

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

4614

# `InspectDataSource`. Not currently supported in `InspectContent`.

4615

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

4616

# `organizations/433245324/storedInfoTypes/432452342` or

4617

# `projects/project-id/storedInfoTypes/432452342`.

4618

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

4619

# inspection was created. Output-only field, populated by the system.

4620

},

4621

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

4622

# Rules are applied in order that they are specified. Not supported for the

4623

# `surrogate_type` CustomInfoType.

4624

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

4625

# `CustomInfoType` to alter behavior under certain circumstances, depending

4626

# on the specific details of the rule. Not supported for the `surrogate_type`

4627

# custom infoType.

4628

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

4629

# proximity of hotwords.

4630

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

4631

# The total length of the window cannot exceed 1000 characters. Note that

4632

# the finding itself will be included in the window, so that hotwords may

4633

# be used to match substrings of the finding itself. For example, the

4634

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

4635

# adjusted upwards if the area code is known to be the local area code of

4636

# a company office using the hotword regex "\(xxx\)", where "xxx"

4637

# is the area code in question.

4638

# rule.

4639

"windowAfter": 42, # Number of characters after the finding to consider.

4640

"windowBefore": 42, # Number of characters before the finding to consider.

4641

},

4642

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

4643

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4644

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4645

# google/re2 repository on GitHub.

4646

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4647

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

4652

# part of a detection rule.

4653

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

4654

# levels. For example, if a finding would be `POSSIBLE` without the

4655

# detection rule and `relative_likelihood` is 1, then it is upgraded to

4656

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

4657

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

4658

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

4659

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

4660

# a final likelihood of `LIKELY`.

4661

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

4667

# to be returned. It still can be used for rules matching.

4668

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

4669

# altered by a detection rule if the finding meets the criteria specified by

4670

# the rule. Defaults to `VERY_LIKELY` if not specified.

4671

},

4672

],

4673

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

4674

# included in the response; see Finding.quote.

4675

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

4676

# Exclusion rules, contained in the set are executed in the end, other

4677

# rules are executed in the order they are specified for each info type.

4678

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

4679

# circumstances, depending on the specific details of the rules within the set.

4680

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

4681

{ # A single inspection rule to be applied to infoTypes, specified in

4682

# `InspectionRuleSet`.

4683

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

4684

# proximity of hotwords.

4685

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

4686

# The total length of the window cannot exceed 1000 characters. Note that

4687

# the finding itself will be included in the window, so that hotwords may

4688

# be used to match substrings of the finding itself. For example, the

4689

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

4690

# adjusted upwards if the area code is known to be the local area code of

4691

# a company office using the hotword regex "\(xxx\)", where "xxx"

4692

# is the area code in question.

4693

# rule.

4694

"windowAfter": 42, # Number of characters after the finding to consider.

4695

"windowBefore": 42, # Number of characters before the finding to consider.

4696

},

4697

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

4698

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4699

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4700

# google/re2 repository on GitHub.

4701

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4702

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

4707

# part of a detection rule.

4708

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

4709

# levels. For example, if a finding would be `POSSIBLE` without the

4710

# detection rule and `relative_likelihood` is 1, then it is upgraded to

4711

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

4712

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

4713

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

4714

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

4715

# a final likelihood of `LIKELY`.

4716

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

4717

},

4718

},

4719

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

4720

# `InspectionRuleSet` are removed from results.

4721

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

4722

"pattern": "A String", # Pattern defining the regular expression. Its syntax

4723

# (https://github.com/google/re2/wiki/Syntax) can be found under the

4724

# google/re2 repository on GitHub.

4725

"groupIndexes": [ # The index of the submatch to extract as findings. When not

4726

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

4731

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

4732

# contained within with a finding of an infoType from this list. For

4733

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

4734

# `exclusion_rule` containing `exclude_info_types.info_types` with

4735

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

4736

# with EMAIL_ADDRESS finding.

4737

# That leads to "555-222-2222@example.org" to generate only a single

4738

# finding, namely email address.

4739

{ # Type of information detected by the API.

4740

"name": "A String", # Name of the information type. Either a name of your choosing when

4741

# creating a CustomInfoType, or one of the names listed

4742

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4743

# a built-in type. InfoType names should conform to the pattern

4744

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

4749

# be used to match sensitive information specific to the data, such as a list

4750

# of employee IDs or job titles.

4751

#

4752

# Dictionary words are case-insensitive and all characters other than letters

4753

# and digits in the unicode [Basic Multilingual

4754

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

4755

# will be replaced with whitespace when scanning for matches, so the

4756

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

4757

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

4758

# surrounding any match must be of a different type than the adjacent

4759

# characters within the word, so letters must be next to non-letters and

4760

# digits next to non-digits. For example, the dictionary word "jen" will

4761

# match the first three letters of the text "jen123" but will return no

4762

# matches for "jennifer".

4763

#

4764

# Dictionary words containing a large number of characters that are not

4765

# letters or digits may result in unexpected findings because such characters

4766

# are treated as whitespace. The

4767

# [limits](https://cloud.google.com/dlp/limits) page contains details about

4768

# the size limits of dictionaries. For dictionaries that do not fit within

4769

# these constraints, consider using `LargeCustomDictionaryConfig` in the

4770

# `StoredInfoType` API.

4771

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

4772

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

4773

# at least one phrase and every phrase must contain at least 2 characters

4774

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

4779

# is accepted.

4780

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

4781

# Example: gs://[BUCKET_NAME]/dictionary.txt

4782

},

4783

},

4784

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

4789

{ # Type of information detected by the API.

4790

"name": "A String", # Name of the information type. Either a name of your choosing when

4791

# creating a CustomInfoType, or one of the names listed

4792

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4793

# a built-in type. InfoType names should conform to the pattern

4794

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

4800

# If empty, text, images, and other content will be included.

4801

"A String",

4802

],

4803

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

4804

# InfoType values returned by ListInfoTypes or listed at

4805

# https://cloud.google.com/dlp/docs/infotypes-reference.

4806

#

4807

# When no InfoTypes or CustomInfoTypes are specified in a request, the

4808

# system may automatically choose what detectors to run. By default this may

4809

# be all types, but may change over time as detectors are updated.

4810

#

4811

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

4812

# but may change over time as new InfoTypes are added. If you need precise

4813

# control and predictability as to what detectors are run you should specify

4814

# specific InfoTypes listed in the reference.

4815

{ # Type of information detected by the API.

4816

"name": "A String", # Name of the information type. Either a name of your choosing when

4817

# creating a CustomInfoType, or one of the names listed

4818

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

4819

# a built-in type. InfoType names should conform to the pattern

4820

# [a-zA-Z0-9_]{1,64}.

},

],

},

"createTime": "A String", # The creation timestamp of a inspectTemplate, output only field.

4825

"name": "A String", # The template name. Output only.

4826

#

4827

# The template will have one of the following formats:

4828

# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR

4829

# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`

4830

},

4831

"jobConfig": {

4832

"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.

4833

"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.

4834

"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always

4835

# by project and namespace, however the namespace ID may be empty.

4836

# A partition ID identifies a grouping of entities. The grouping is always

4837

# by project and namespace, however the namespace ID may be empty.

4838

#

4839

# A partition ID contains several dimensions:

4840

# project ID and namespace ID.

4841

"projectId": "A String", # The ID of the project to which the entities belong.

4842

"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.

4843

},

4844

"kind": { # A representation of a Datastore kind. # The kind to process.

4845

"name": "A String", # The name of the kind.

4846

},

4847

},

4848

"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.

4849

"excludedFields": [ # References to fields excluded from scanning. This allows you to skip

4850

# inspection of entire columns which you know have no findings.

4851

{ # General identifier of a data field in a storage service.

4852

"name": "A String", # Name describing the field.

4853

},

4854

],

4855

"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the

4856

# rest of the rows are omitted. If not set, or if set to 0, all rows will be

4857

# scanned. Only one of rows_limit and rows_limit_percent can be specified.

4858

# Cannot be used in conjunction with TimespanConfig.

4859

"sampleMethod": "A String",

4860

"identifyingFields": [ # References to fields uniquely identifying rows within the table.

4861

# Nested fields in the format, like `person.birthdate.year`, are allowed.

4862

{ # General identifier of a data field in a storage service.

4863

"name": "A String", # Name describing the field.

4864

},

4865

],

4866

"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows

4867

# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and

4868

# 100 means no limit. Defaults to 0. Only one of rows_limit and

4869

# rows_limit_percent can be specified. Cannot be used in conjunction with

4870

# TimespanConfig.

4871

"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.

4872

# identified by its project_id, dataset_id, and table_name. Within a query

4873

# a table is often referenced with a string in the format of:

4874

# `<project_id>:<dataset_id>.<table_id>` or

4875

# `<project_id>.<dataset_id>.<table_id>`.

4876

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

4877

# If omitted, project ID is inferred from the API call.

4878

"tableId": "A String", # Name of the table.

4879

"datasetId": "A String", # Dataset ID of the table.

4880

},

4881

},

4882

"timespanConfig": { # Configuration of the timespan of the items to include in scanning.

4883

# Currently only supported when inspecting Google Cloud Storage and BigQuery.

4884

"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.

4885

# Used for data sources like Datastore or BigQuery.

4886

# If not specified for BigQuery, table last modification timestamp

4887

# is checked against given time span.

4888

# The valid data types of the timestamp field are:

4889

# for BigQuery - timestamp, date, datetime;

4890

# for Datastore - timestamp.

4891

# Datastore entity will be scanned if the timestamp property does not exist

4892

# or its value is empty or invalid.

4893

"name": "A String", # Name describing the field.

4894

},

4895

"endTime": "A String", # Exclude files or rows newer than this value.

4896

# If set to zero, no upper time limit is applied.

4897

"startTime": "A String", # Exclude files or rows older than this value.

4898

"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out

4899

# a valid start_time to avoid scanning files that have not been modified

4900

# since the last time the JobTrigger executed. This will be based on the

4901

# time of the execution of the last run of the JobTrigger.

4902

},

4903

"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.

4904

# bucket.

4905

"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger

4906

# than this value then the rest of the bytes are omitted. Only one

4907

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

4908

"sampleMethod": "A String",

4909

"fileSet": { # Set of files to scan. # The set of one or more files to scan.

4910

"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format

4911

# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.

4912

#

4913

# If the url ends in a trailing slash, the bucket or directory represented

4914

# by the url will be scanned non-recursively (content in sub-directories

4915

# will not be scanned). This means that `gs://mybucket/` is equivalent to

4916

# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to

4917

# `gs://mybucket/directory/*`.

4918

#

4919

# Exactly one of `url` or `regex_file_set` must be set.

4920

"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or

4921

# `regex_file_set` must be set.

4922

# expressions are used to allow fine-grained control over which files in the

4923

# bucket to include.

4924

#

4925

# Included files are those that match at least one item in `include_regex` and

4926

# do not match any items in `exclude_regex`. Note that a file that matches

4927

# items from both lists will _not_ be included. For a match to occur, the

4928

# entire file path (i.e., everything in the url after the bucket name) must

4929

# match the regular expression.

4930

#

4931

# For example, given the input `{bucket_name: "mybucket", include_regex:

4932

# ["directory1/.*"], exclude_regex:

4933

# ["directory1/excluded.*"]}`:

4934

#

4935

# * `gs://mybucket/directory1/myfile` will be included

4936

# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches

4937

# across `/`)

4938

# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the

4939

# full path doesn't match any items in `include_regex`)

4940

# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path

4941

# matches an item in `exclude_regex`)

4942

#

4943

# If `include_regex` is left empty, it will match all files by default

4944

# (this is equivalent to setting `include_regex: [".*"]`).

4945

#

4946

# Some other common use cases:

4947

#

4948

# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all

4949

# files in `mybucket` except for .pdf files

4950

# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will

4951

# include all files directly under `gs://mybucket/directory/`, without matching

4952

# across `/`

4953

"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in

4954

# the bucket that match at least one of these regular expressions will be

4955

# excluded from the scan.

4956

#

4957

# Regular expressions use RE2

4958

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

4959

# under the google/re2 repository on GitHub.

4960

"A String",

4961

],

4962

"bucketName": "A String", # The name of a Cloud Storage bucket. Required.

4963

"includeRegex": [ # A list of regular expressions matching file paths to include. All files in

4964

# the bucket that match at least one of these regular expressions will be

4965

# included in the set of files, except for those that also match an item in

4966

# `exclude_regex`. Leaving this field empty will match all files by default

4967

# (this is equivalent to including `.*` in the list).

4968

#

4969

# Regular expressions use RE2

4970

# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found

4971

# under the google/re2 repository on GitHub.

"A String",

],

},

},

"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The

4977

# number of bytes scanned is rounded down. Must be between 0 and 100,

4978

# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one

4979

# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.

4980

"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.

4981

# Number of files scanned is rounded down. Must be between 0 and 100,

4982

# inclusively. Both 0 and 100 means no limit. Defaults to 0.

4983

"fileTypes": [ # List of file type groups to include in the scan.

4984

# If empty, all files are scanned and available data format processors

4985

# are applied. In addition, the binary content of the selected files

4986

# is always scanned as well.

"A String",

],

},

},

"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.

4992

# When used with redactContent only info_types and min_likelihood are currently

4993

# used.

4994

"excludeInfoTypes": True or False, # When true, excludes type information of the findings.

4995

"limits": {

4996

"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.

4997

# When set within `InspectContentRequest`, the maximum returned is 2000

4998

# regardless if this is set higher.

4999

"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.

5000

{ # Max findings configuration per infoType, per content item or long

5001

# running DlpJob.

5002

"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per

5003

# info_type should be provided. If InfoTypeLimit does not have an

5004

# info_type, the DLP API applies the limit against all info_types that

5005

# are found but not specified in another InfoTypeLimit.

5006

"name": "A String", # Name of the information type. Either a name of your choosing when

5007

# creating a CustomInfoType, or one of the names listed

5008

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5009

# a built-in type. InfoType names should conform to the pattern

5010

# [a-zA-Z0-9_]{1,64}.

5011

},

5012

"maxFindings": 42, # Max findings limit for the given infoType.

5013

},

5014

],

5015

"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.

5016

# When set within `InspectDataSourceRequest`,

5017

# the maximum returned is 2000 regardless if this is set higher.

5018

# When set within `InspectContentRequest`, this field is ignored.

5019

},

5020

"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is

5021

# POSSIBLE.

5022

# See https://cloud.google.com/dlp/docs/likelihood to learn more.

5023

"customInfoTypes": [ # CustomInfoTypes provided by the user. See

5024

# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.

5025

{ # Custom information type provided by the user. Used to find domain-specific

5026

# sensitive information configurable to the data in question.

5027

"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.

5028

"pattern": "A String", # Pattern defining the regular expression. Its syntax

5029

# (https://github.com/google/re2/wiki/Syntax) can be found under the

5030

# google/re2 repository on GitHub.

5031

"groupIndexes": [ # The index of the submatch to extract as findings. When not

5032

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that

5037

# support reversing.

5038

# such as

5039

# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).

5040

# These types of transformations are

5041

# those that perform pseudonymization, thereby producing a "surrogate" as

5042

# output. This should be used in conjunction with a field on the

5043

# transformation such as `surrogate_info_type`. This CustomInfoType does

5044

# not support the use of `detection_rules`.

5045

},

5046

"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in

5047

# infoType, when the name matches one of existing infoTypes and that infoType

5048

# is specified in `InspectContent.info_types` field. Specifying the latter

5049

# adds findings to the one detected by the system. If built-in info type is

5050

# not specified in `InspectContent.info_types` list then the name is treated

5051

# as a custom info type.

5052

"name": "A String", # Name of the information type. Either a name of your choosing when

5053

# creating a CustomInfoType, or one of the names listed

5054

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5055

# a built-in type. InfoType names should conform to the pattern

5056

# [a-zA-Z0-9_]{1,64}.

5057

},

5058

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.

5059

# be used to match sensitive information specific to the data, such as a list

5060

# of employee IDs or job titles.

5061

#

5062

# Dictionary words are case-insensitive and all characters other than letters

5063

# and digits in the unicode [Basic Multilingual

5064

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

5065

# will be replaced with whitespace when scanning for matches, so the

5066

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

5067

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

5068

# surrounding any match must be of a different type than the adjacent

5069

# characters within the word, so letters must be next to non-letters and

5070

# digits next to non-digits. For example, the dictionary word "jen" will

5071

# match the first three letters of the text "jen123" but will return no

5072

# matches for "jennifer".

5073

#

5074

# Dictionary words containing a large number of characters that are not

5075

# letters or digits may result in unexpected findings because such characters

5076

# are treated as whitespace. The

5077

# [limits](https://cloud.google.com/dlp/limits) page contains details about

5078

# the size limits of dictionaries. For dictionaries that do not fit within

5079

# these constraints, consider using `LargeCustomDictionaryConfig` in the

5080

# `StoredInfoType` API.

5081

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

5082

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

5083

# at least one phrase and every phrase must contain at least 2 characters

5084

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

5089

# is accepted.

5090

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

5091

# Example: gs://[BUCKET_NAME]/dictionary.txt

5092

},

5093

},

5094

"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in

5095

# `InspectDataSource`. Not currently supported in `InspectContent`.

5096

"name": "A String", # Resource name of the requested `StoredInfoType`, for example

5097

# `organizations/433245324/storedInfoTypes/432452342` or

5098

# `projects/project-id/storedInfoTypes/432452342`.

5099

"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for

5100

# inspection was created. Output-only field, populated by the system.

5101

},

5102

"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.

5103

# Rules are applied in order that they are specified. Not supported for the

5104

# `surrogate_type` CustomInfoType.

5105

{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a

5106

# `CustomInfoType` to alter behavior under certain circumstances, depending

5107

# on the specific details of the rule. Not supported for the `surrogate_type`

5108

# custom infoType.

5109

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

5110

# proximity of hotwords.

5111

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

5112

# The total length of the window cannot exceed 1000 characters. Note that

5113

# the finding itself will be included in the window, so that hotwords may

5114

# be used to match substrings of the finding itself. For example, the

5115

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

5116

# adjusted upwards if the area code is known to be the local area code of

5117

# a company office using the hotword regex "\(xxx\)", where "xxx"

5118

# is the area code in question.

5119

# rule.

5120

"windowAfter": 42, # Number of characters after the finding to consider.

5121

"windowBefore": 42, # Number of characters before the finding to consider.

5122

},

5123

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

5124

"pattern": "A String", # Pattern defining the regular expression. Its syntax

5125

# (https://github.com/google/re2/wiki/Syntax) can be found under the

5126

# google/re2 repository on GitHub.

5127

"groupIndexes": [ # The index of the submatch to extract as findings. When not

5128

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

5133

# part of a detection rule.

5134

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

5135

# levels. For example, if a finding would be `POSSIBLE` without the

5136

# detection rule and `relative_likelihood` is 1, then it is upgraded to

5137

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

5138

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

5139

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

5140

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

5141

# a final likelihood of `LIKELY`.

5142

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

},

},

},

],

"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding

5148

# to be returned. It still can be used for rules matching.

5149

"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be

5150

# altered by a detection rule if the finding meets the criteria specified by

5151

# the rule. Defaults to `VERY_LIKELY` if not specified.

5152

},

5153

],

5154

"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is

5155

# included in the response; see Finding.quote.

5156

"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.

5157

# Exclusion rules, contained in the set are executed in the end, other

5158

# rules are executed in the order they are specified for each info type.

5159

{ # Rule set for modifying a set of infoTypes to alter behavior under certain

5160

# circumstances, depending on the specific details of the rules within the set.

5161

"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.

5162

{ # A single inspection rule to be applied to infoTypes, specified in

5163

# `InspectionRuleSet`.

5164

"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.

5165

# proximity of hotwords.

5166

"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.

5167

# The total length of the window cannot exceed 1000 characters. Note that

5168

# the finding itself will be included in the window, so that hotwords may

5169

# be used to match substrings of the finding itself. For example, the

5170

# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be

5171

# adjusted upwards if the area code is known to be the local area code of

5172

# a company office using the hotword regex "\(xxx\)", where "xxx"

5173

# is the area code in question.

5174

# rule.

5175

"windowAfter": 42, # Number of characters after the finding to consider.

5176

"windowBefore": 42, # Number of characters before the finding to consider.

5177

},

5178

"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.

5179

"pattern": "A String", # Pattern defining the regular expression. Its syntax

5180

# (https://github.com/google/re2/wiki/Syntax) can be found under the

5181

# google/re2 repository on GitHub.

5182

"groupIndexes": [ # The index of the submatch to extract as findings. When not

5183

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.

5188

# part of a detection rule.

5189

"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of

5190

# levels. For example, if a finding would be `POSSIBLE` without the

5191

# detection rule and `relative_likelihood` is 1, then it is upgraded to

5192

# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.

5193

# Likelihood may never drop below `VERY_UNLIKELY` or exceed

5194

# `VERY_LIKELY`, so applying an adjustment of 1 followed by an

5195

# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in

5196

# a final likelihood of `LIKELY`.

5197

"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.

5198

},

5199

},

5200

"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.

5201

# `InspectionRuleSet` are removed from results.

5202

"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.

5203

"pattern": "A String", # Pattern defining the regular expression. Its syntax

5204

# (https://github.com/google/re2/wiki/Syntax) can be found under the

5205

# google/re2 repository on GitHub.

5206

"groupIndexes": [ # The index of the submatch to extract as findings. When not

5207

# specified, the entire match is returned. No more than 3 may be included.

42,

],

},

"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.

5212

"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or

5213

# contained within with a finding of an infoType from this list. For

5214

# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and

5215

# `exclusion_rule` containing `exclude_info_types.info_types` with

5216

# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap

5217

# with EMAIL_ADDRESS finding.

5218

# That leads to "555-222-2222@example.org" to generate only a single

5219

# finding, namely email address.

5220

{ # Type of information detected by the API.

5221

"name": "A String", # Name of the information type. Either a name of your choosing when

5222

# creating a CustomInfoType, or one of the names listed

5223

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5224

# a built-in type. InfoType names should conform to the pattern

5225

# [a-zA-Z0-9_]{1,64}.

},

],

},

"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.

5230

# be used to match sensitive information specific to the data, such as a list

5231

# of employee IDs or job titles.

5232

#

5233

# Dictionary words are case-insensitive and all characters other than letters

5234

# and digits in the unicode [Basic Multilingual

5235

# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)

5236

# will be replaced with whitespace when scanning for matches, so the

5237

# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",

5238

# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters

5239

# surrounding any match must be of a different type than the adjacent

5240

# characters within the word, so letters must be next to non-letters and

5241

# digits next to non-digits. For example, the dictionary word "jen" will

5242

# match the first three letters of the text "jen123" but will return no

5243

# matches for "jennifer".

5244

#

5245

# Dictionary words containing a large number of characters that are not

5246

# letters or digits may result in unexpected findings because such characters

5247

# are treated as whitespace. The

5248

# [limits](https://cloud.google.com/dlp/limits) page contains details about

5249

# the size limits of dictionaries. For dictionaries that do not fit within

5250

# these constraints, consider using `LargeCustomDictionaryConfig` in the

5251

# `StoredInfoType` API.

5252

"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.

5253

"words": [ # Words or phrases defining the dictionary. The dictionary must contain

5254

# at least one phrase and every phrase must contain at least 2 characters

5255

# that are letters or digits. [required]

"A String",

],

},

"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file

5260

# is accepted.

5261

"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.

5262

# Example: gs://[BUCKET_NAME]/dictionary.txt

5263

},

5264

},

5265

"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.

},

},

],

"infoTypes": [ # List of infoTypes this rule set is applied to.

5270

{ # Type of information detected by the API.

5271

"name": "A String", # Name of the information type. Either a name of your choosing when

5272

# creating a CustomInfoType, or one of the names listed

5273

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5274

# a built-in type. InfoType names should conform to the pattern

5275

# [a-zA-Z0-9_]{1,64}.

},

],

},

],

"contentOptions": [ # List of options defining data content to scan.

5281

# If empty, text, images, and other content will be included.

5282

"A String",

5283

],

5284

"infoTypes": [ # Restricts what info_types to look for. The values must correspond to

5285

# InfoType values returned by ListInfoTypes or listed at

5286

# https://cloud.google.com/dlp/docs/infotypes-reference.

5287

#

5288

# When no InfoTypes or CustomInfoTypes are specified in a request, the

5289

# system may automatically choose what detectors to run. By default this may

5290

# be all types, but may change over time as detectors are updated.

5291

#

5292

# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,

5293

# but may change over time as new InfoTypes are added. If you need precise

5294

# control and predictability as to what detectors are run you should specify

5295

# specific InfoTypes listed in the reference.

5296

{ # Type of information detected by the API.

5297

"name": "A String", # Name of the information type. Either a name of your choosing when

5298

# creating a CustomInfoType, or one of the names listed

5299

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5300

# a built-in type. InfoType names should conform to the pattern

5301

# [a-zA-Z0-9_]{1,64}.

},

],

},

"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.

5306

# `inspect_config` will be merged into the values persisted as part of the

5307

# template.

5308

"actions": [ # Actions to execute at the completion of the job.

5309

{ # A task to execute on the completion of a job.

5310

# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.

5311

"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.

5312

# OutputStorageConfig. Only a single instance of this action can be

5313

# specified.

5314

# Compatible with: Inspect, Risk

5315

"outputConfig": { # Cloud repository for storing output.

5316

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing

5317

# dataset. If table_id is not set a new one will be generated

5318

# for you with the following format:

5319

# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for

5320

# generating the date details.

5321

#

5322

# For Inspect, each column in an existing output table must have the same

5323

# name, type, and mode of a field in the `Finding` object.

5324

#

5325

# For Risk, an existing output table should be the output of a previous

5326

# Risk analysis job run on the same source table, with the same privacy

5327

# metric and quasi-identifiers. Risk jobs that analyze the same table but

5328

# compute a different privacy metric, or use different sets of

5329

# quasi-identifiers, cannot store their results in the same table.

5330

# identified by its project_id, dataset_id, and table_name. Within a query

5331

# a table is often referenced with a string in the format of:

5332

# `<project_id>:<dataset_id>.<table_id>` or

5333

# `<project_id>.<dataset_id>.<table_id>`.

5334

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5335

# If omitted, project ID is inferred from the API call.

5336

"tableId": "A String", # Name of the table.

5337

"datasetId": "A String", # Dataset ID of the table.

5338

},

5339

"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only

5340

# used for Inspect and must be unspecified for Risk jobs. Columns are derived

5341

# from the `Finding` object. If appending to an existing table, any columns

5342

# from the predefined schema that are missing will be added. No columns in

5343

# the existing table will be deleted.

5344

#

5345

# If unspecified, then all available columns will be used for a new table or

5346

# an (existing) table with no schema, and no changes will be made to an

5347

# existing table that has a schema.

5348

},

5349

},

5350

"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's

5351

# completion/failure.

5352

# completion/failure.

5353

},

5354

"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).

5355

# Command Center (CSCC Alpha).

5356

# This action is only available for projects which are parts of

5357

# an organization and whitelisted for the alpha Cloud Security Command

5358

# Center.

5359

# The action will publish count of finding instances and their info types.

5360

# The summary of findings will be persisted in CSCC and are governed by CSCC

5361

# service-specific policy, see https://cloud.google.com/terms/service-terms

5362

# Only a single instance of this action can be specified.

5363

# Compatible with: Inspect

5364

},

5365

"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.

5366

# message contains a single field, `DlpJobName`, which is equal to the

5367

# finished job's

5368

# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).

5369

# Compatible with: Inspect, Risk

5370

"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given

5371

# publishing access rights to the DLP API service account executing

5372

# the long running DlpJob sending the notifications.

5373

# Format is projects/{project}/topics/{topic}.

},

},

],

},

},

"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.

5380

"infoTypeStats": [ # Statistics of how many instances of each info type were found during

5381

# inspect job.

5382

{ # Statistics regarding a specific InfoType.

5383

"count": "A String", # Number of findings for this infoType.

5384

"infoType": { # Type of information detected by the API. # The type of finding this stat is for.

5385

"name": "A String", # Name of the information type. Either a name of your choosing when

5386

# creating a CustomInfoType, or one of the names listed

5387

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5388

# a built-in type. InfoType names should conform to the pattern

5389

# [a-zA-Z0-9_]{1,64}.

},

},

],

"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.

5394

"processedBytes": "A String", # Total size in bytes that were processed.

5395

},

5396

},

5397

"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.

5398

"numericalStatsResult": { # Result of the numerical stats computation.

5399

"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal

5400

# sized buckets.

5401

{ # Set of primitive values supported by the system.

5402

# Note that for the purposes of inspection or transformation, the number

5403

# of bytes considered to comprise a 'Value' is based on its representation

5404

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5405

# 123456789, the number of bytes would be counted as 9, even though an

5406

# int64 only holds up to 8 bytes of data.

5407

"floatValue": 3.14,

5408

"timestampValue": "A String",

5409

"dayOfWeekValue": "A String",

5410

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5411

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5412

# types are google.type.Date and `google.protobuf.Timestamp`.

5413

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5414

# to allow the value "24:00:00" for scenarios like business closing time.

5415

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5416

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5417

# allow the value 60 if it allows leap-seconds.

5418

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5419

},

5420

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5421

# and time zone are either specified elsewhere or are not significant. The date

5422

# is relative to the Proleptic Gregorian Calendar. This can represent:

5423

#

5424

# * A full date, with non-zero year, month and day values

5425

# * A month and day value, with a zero year, e.g. an anniversary

5426

# * A year on its own, with zero month and day values

5427

# * A year and month value, with a zero day, e.g. a credit card expiration date

5428

#

5429

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5430

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5431

# a year.

5432

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5433

# if specifying a year by itself or a year and month where the day is not

5434

# significant.

5435

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5436

# month and day.

5437

},

5438

"stringValue": "A String",

5439

"booleanValue": True or False,

5440

"integerValue": "A String",

5441

},

5442

],

5443

"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.

5444

# Note that for the purposes of inspection or transformation, the number

5445

# of bytes considered to comprise a 'Value' is based on its representation

5446

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5447

# 123456789, the number of bytes would be counted as 9, even though an

5448

# int64 only holds up to 8 bytes of data.

5449

"floatValue": 3.14,

5450

"timestampValue": "A String",

5451

"dayOfWeekValue": "A String",

5452

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5453

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5454

# types are google.type.Date and `google.protobuf.Timestamp`.

5455

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5456

# to allow the value "24:00:00" for scenarios like business closing time.

5457

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5458

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5459

# allow the value 60 if it allows leap-seconds.

5460

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5461

},

5462

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5463

# and time zone are either specified elsewhere or are not significant. The date

5464

# is relative to the Proleptic Gregorian Calendar. This can represent:

5465

#

5466

# * A full date, with non-zero year, month and day values

5467

# * A month and day value, with a zero year, e.g. an anniversary

5468

# * A year on its own, with zero month and day values

5469

# * A year and month value, with a zero day, e.g. a credit card expiration date

5470

#

5471

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5472

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5473

# a year.

5474

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5475

# if specifying a year by itself or a year and month where the day is not

5476

# significant.

5477

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5478

# month and day.

5479

},

5480

"stringValue": "A String",

5481

"booleanValue": True or False,

5482

"integerValue": "A String",

5483

},

5484

"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.

5485

# Note that for the purposes of inspection or transformation, the number

5486

# of bytes considered to comprise a 'Value' is based on its representation

5487

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5488

# 123456789, the number of bytes would be counted as 9, even though an

5489

# int64 only holds up to 8 bytes of data.

5490

"floatValue": 3.14,

5491

"timestampValue": "A String",

5492

"dayOfWeekValue": "A String",

5493

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5494

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5495

# types are google.type.Date and `google.protobuf.Timestamp`.

5496

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5497

# to allow the value "24:00:00" for scenarios like business closing time.

5498

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5499

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5500

# allow the value 60 if it allows leap-seconds.

5501

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5502

},

5503

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5504

# and time zone are either specified elsewhere or are not significant. The date

5505

# is relative to the Proleptic Gregorian Calendar. This can represent:

5506

#

5507

# * A full date, with non-zero year, month and day values

5508

# * A month and day value, with a zero year, e.g. an anniversary

5509

# * A year on its own, with zero month and day values

5510

# * A year and month value, with a zero day, e.g. a credit card expiration date

5511

#

5512

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5513

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5514

# a year.

5515

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5516

# if specifying a year by itself or a year and month where the day is not

5517

# significant.

5518

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5519

# month and day.

5520

},

5521

"stringValue": "A String",

5522

"booleanValue": True or False,

5523

"integerValue": "A String",

5524

},

5525

},

5526

"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an

5527

# estimation, not exact values.

5528

"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value

5529

# doesn't correspond to any such interval, the associated frequency is

5530

# zero. For example, the following records:

5531

# {min_anonymity: 1, max_anonymity: 1, frequency: 17}

5532

# {min_anonymity: 2, max_anonymity: 3, frequency: 42}

5533

# {min_anonymity: 5, max_anonymity: 10, frequency: 99}

5534

# mean that there are no record with an estimated anonymity of 4, 5, or

5535

# larger than 10.

5536

{ # A KMapEstimationHistogramBucket message with the following values:

# min_anonymity: 3

# max_anonymity: 5

# frequency: 42

# means that there are 42 records whose quasi-identifier values correspond

5541

# to 3, 4 or 5 people in the overlying population. An important particular

5542

# case is when min_anonymity = max_anonymity = 1: the frequency field then

5543

# corresponds to the number of uniquely identifiable records.

5544

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

5545

# number of classes returned per bucket is capped at 20.

5546

{ # A tuple of values for the quasi-identifier columns.

5547

"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.

5548

"quasiIdsValues": [ # The quasi-identifier values.

5549

{ # Set of primitive values supported by the system.

5550

# Note that for the purposes of inspection or transformation, the number

5551

# of bytes considered to comprise a 'Value' is based on its representation

5552

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5553

# 123456789, the number of bytes would be counted as 9, even though an

5554

# int64 only holds up to 8 bytes of data.

5555

"floatValue": 3.14,

5556

"timestampValue": "A String",

5557

"dayOfWeekValue": "A String",

5558

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5559

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5560

# types are google.type.Date and `google.protobuf.Timestamp`.

5561

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5562

# to allow the value "24:00:00" for scenarios like business closing time.

5563

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5564

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5565

# allow the value 60 if it allows leap-seconds.

5566

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5567

},

5568

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5569

# and time zone are either specified elsewhere or are not significant. The date

5570

# is relative to the Proleptic Gregorian Calendar. This can represent:

5571

#

5572

# * A full date, with non-zero year, month and day values

5573

# * A month and day value, with a zero year, e.g. an anniversary

5574

# * A year on its own, with zero month and day values

5575

# * A year and month value, with a zero day, e.g. a credit card expiration date

5576

#

5577

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5578

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5579

# a year.

5580

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5581

# if specifying a year by itself or a year and month where the day is not

5582

# significant.

5583

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5584

# month and day.

5585

},

5586

"stringValue": "A String",

5587

"booleanValue": True or False,

5588

"integerValue": "A String",

},

],

},

],

"minAnonymity": "A String", # Always positive.

5594

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

5595

"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.

5596

"bucketSize": "A String", # Number of records within these anonymity bounds.

},

],

},

"kAnonymityResult": { # Result of the k-anonymity computation.

5601

"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.

5602

{

5603

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

5604

# classes returned per bucket is capped at 20.

5605

{ # The set of columns' values that share the same ldiversity value

5606

"quasiIdsValues": [ # Set of values defining the equivalence class. One value per

5607

# quasi-identifier column in the original KAnonymity metric message.

5608

# The order is always the same as the original request.

5609

{ # Set of primitive values supported by the system.

5610

# Note that for the purposes of inspection or transformation, the number

5611

# of bytes considered to comprise a 'Value' is based on its representation

5612

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5613

# 123456789, the number of bytes would be counted as 9, even though an

5614

# int64 only holds up to 8 bytes of data.

5615

"floatValue": 3.14,

5616

"timestampValue": "A String",

5617

"dayOfWeekValue": "A String",

5618

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5619

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5620

# types are google.type.Date and `google.protobuf.Timestamp`.

5621

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5622

# to allow the value "24:00:00" for scenarios like business closing time.

5623

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5624

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5625

# allow the value 60 if it allows leap-seconds.

5626

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5627

},

5628

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5629

# and time zone are either specified elsewhere or are not significant. The date

5630

# is relative to the Proleptic Gregorian Calendar. This can represent:

5631

#

5632

# * A full date, with non-zero year, month and day values

5633

# * A month and day value, with a zero year, e.g. an anniversary

5634

# * A year on its own, with zero month and day values

5635

# * A year and month value, with a zero day, e.g. a credit card expiration date

5636

#

5637

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5638

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5639

# a year.

5640

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5641

# if specifying a year by itself or a year and month where the day is not

5642

# significant.

5643

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5644

# month and day.

5645

},

5646

"stringValue": "A String",

5647

"booleanValue": True or False,

5648

"integerValue": "A String",

5649

},

5650

],

5651

"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the

5652

# above set of values.

5653

},

5654

],

5655

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

5656

"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.

5657

"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.

5658

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

},

],

},

"lDiversityResult": { # Result of the l-diversity computation.

5663

"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.

5664

{

5665

"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of

5666

# classes returned per bucket is capped at 20.

5667

{ # The set of columns' values that share the same ldiversity value.

5668

"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.

5669

"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence

5670

# class. The order is always the same as the original request.

5671

{ # Set of primitive values supported by the system.

5672

# Note that for the purposes of inspection or transformation, the number

5673

# of bytes considered to comprise a 'Value' is based on its representation

5674

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5675

# 123456789, the number of bytes would be counted as 9, even though an

5676

# int64 only holds up to 8 bytes of data.

5677

"floatValue": 3.14,

5678

"timestampValue": "A String",

5679

"dayOfWeekValue": "A String",

5680

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5681

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5682

# types are google.type.Date and `google.protobuf.Timestamp`.

5683

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5684

# to allow the value "24:00:00" for scenarios like business closing time.

5685

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5686

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5687

# allow the value 60 if it allows leap-seconds.

5688

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5689

},

5690

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5691

# and time zone are either specified elsewhere or are not significant. The date

5692

# is relative to the Proleptic Gregorian Calendar. This can represent:

5693

#

5694

# * A full date, with non-zero year, month and day values

5695

# * A month and day value, with a zero year, e.g. an anniversary

5696

# * A year on its own, with zero month and day values

5697

# * A year and month value, with a zero day, e.g. a credit card expiration date

5698

#

5699

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5700

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5701

# a year.

5702

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5703

# if specifying a year by itself or a year and month where the day is not

5704

# significant.

5705

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5706

# month and day.

5707

},

5708

"stringValue": "A String",

5709

"booleanValue": True or False,

5710

"integerValue": "A String",

5711

},

5712

],

5713

"topSensitiveValues": [ # Estimated frequencies of top sensitive values.

5714

{ # A value of a field, including its frequency.

5715

"count": "A String", # How many times the value is contained in the field.

5716

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

5717

# Note that for the purposes of inspection or transformation, the number

5718

# of bytes considered to comprise a 'Value' is based on its representation

5719

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5720

# 123456789, the number of bytes would be counted as 9, even though an

5721

# int64 only holds up to 8 bytes of data.

5722

"floatValue": 3.14,

5723

"timestampValue": "A String",

5724

"dayOfWeekValue": "A String",

5725

"timeValue": { # Represents a time of day. The date and time zone are either not significant

5726

# or are specified elsewhere. An API may choose to allow leap seconds. Related

5727

# types are google.type.Date and `google.protobuf.Timestamp`.

5728

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

5729

# to allow the value "24:00:00" for scenarios like business closing time.

5730

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

5731

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

5732

# allow the value 60 if it allows leap-seconds.

5733

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

5734

},

5735

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

5736

# and time zone are either specified elsewhere or are not significant. The date

5737

# is relative to the Proleptic Gregorian Calendar. This can represent:

5738

#

5739

# * A full date, with non-zero year, month and day values

5740

# * A month and day value, with a zero year, e.g. an anniversary

5741

# * A year on its own, with zero month and day values

5742

# * A year and month value, with a zero day, e.g. a credit card expiration date

5743

#

5744

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

5745

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

5746

# a year.

5747

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

5748

# if specifying a year by itself or a year and month where the day is not

5749

# significant.

5750

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

5751

# month and day.

5752

},

5753

"stringValue": "A String",

5754

"booleanValue": True or False,

5755

"integerValue": "A String",

},

},

],

"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.

5760

},

5761

],

5762

"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.

5763

"bucketSize": "A String", # Total number of equivalence classes in this bucket.

5764

"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence

5765

# classes in this bucket.

5766

"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence

5767

# classes in this bucket.

},

],

},

"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.

5772

"numericalStatsConfig": { # Compute numerical stats over an individual column, including

5773

# min, max, and quantiles.

5774

"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are

5775

# integer, float, date, datetime, timestamp, time.

5776

"name": "A String", # Name describing the field.

5777

},

5778

},

5779

"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what

5780

# is called "journalist risk" in the literature, except the attack dataset is

5781

# statistically modeled instead of being perfectly known. This can be done

5782

# using publicly available data (like the US Census), or using a custom

5783

# statistical model (indicated as one or several BigQuery tables), or by

5784

# extrapolating from the distribution of values in the input dataset.

5785

# A column with a semantic tag attached.

5786

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

5787

# Required if no column is tagged with a region-specific InfoType (like

5788

# US_ZIP_5) or a region code.

5789

"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the

5790

# same tag. [required]

5791

{

5792

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

5793

"name": "A String", # Name describing the field.

5794

},

5795

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

5796

# indicate an auxiliary table that contains statistical information on

5797

# the possible values of this column (below).

5798

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

5799

# dataset as a statistical model of population, if available. We

5800

# currently support US ZIP codes, region codes, ages and genders.

5801

# To programmatically obtain the list of supported InfoTypes, use

5802

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

5803

"name": "A String", # Name of the information type. Either a name of your choosing when

5804

# creating a CustomInfoType, or one of the names listed

5805

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5806

# a built-in type. InfoType names should conform to the pattern

5807

# [a-zA-Z0-9_]{1,64}.

5808

},

5809

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

5810

# the distribution of values in the input data

5811

# empty messages in your APIs. A typical example is to use it as the request

5812

# or the response type of an API method. For instance:

5813

#

5814

# service Foo {

5815

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

5816

# }

5817

#

5818

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

5823

# used to tag a quasi-identifiers column must appear in exactly one column

5824

# of one auxiliary table.

5825

{ # An auxiliary table contains statistical information on the relative

5826

# frequency of different quasi-identifiers values. It has one or several

5827

# quasi-identifiers columns, and one column that indicates the relative

5828

# frequency of each quasi-identifier tuple.

5829

# If a tuple is present in the data but not in the auxiliary table, the

5830

# corresponding relative frequency is assumed to be zero (and thus, the

5831

# tuple is highly reidentifiable).

5832

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

5833

# between 0 and 1 (inclusive). Null values are assumed to be zero.

5834

# [required]

5835

"name": "A String", # Name describing the field.

5836

},

5837

"quasiIds": [ # Quasi-identifier columns. [required]

5838

{ # A quasi-identifier column has a custom_tag, used to know which column

5839

# in the data corresponds to which column in the statistical model.

5840

"field": { # General identifier of a data field in a storage service.

5841

"name": "A String", # Name describing the field.

5842

},

5843

"customTag": "A String",

5844

},

5845

],

5846

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

5847

# identified by its project_id, dataset_id, and table_name. Within a query

5848

# a table is often referenced with a string in the format of:

5849

# `<project_id>:<dataset_id>.<table_id>` or

5850

# `<project_id>.<dataset_id>.<table_id>`.

5851

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5852

# If omitted, project ID is inferred from the API call.

5853

"tableId": "A String", # Name of the table.

5854

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.

5860

"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.

5861

"name": "A String", # Name describing the field.

5862

},

5863

"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are

5864

# defined for the l-diversity computation. When multiple fields are

5865

# specified, they are considered a single composite key.

5866

{ # General identifier of a data field in a storage service.

5867

"name": "A String", # Name describing the field.

},

],

},

"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to

5872

# figure out that one given individual appears in a de-identified dataset.

5873

# Similarly to the k-map metric, we cannot compute δ-presence exactly without

5874

# knowing the attack dataset, so we use a statistical model instead.

5875

"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.

5876

# Required if no column is tagged with a region-specific InfoType (like

5877

# US_ZIP_5) or a region code.

5878

"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the

5879

# same tag. [required]

5880

{ # A column with a semantic tag attached.

5881

"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]

5882

"name": "A String", # Name describing the field.

5883

},

5884

"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must

5885

# indicate an auxiliary table that contains statistical information on

5886

# the possible values of this column (below).

5887

"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public

5888

# dataset as a statistical model of population, if available. We

5889

# currently support US ZIP codes, region codes, ages and genders.

5890

# To programmatically obtain the list of supported InfoTypes, use

5891

# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.

5892

"name": "A String", # Name of the information type. Either a name of your choosing when

5893

# creating a CustomInfoType, or one of the names listed

5894

# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying

5895

# a built-in type. InfoType names should conform to the pattern

5896

# [a-zA-Z0-9_]{1,64}.

5897

},

5898

"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from

5899

# the distribution of values in the input data

5900

# empty messages in your APIs. A typical example is to use it as the request

5901

# or the response type of an API method. For instance:

5902

#

5903

# service Foo {

5904

# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);

5905

# }

5906

#

5907

# The JSON representation for `Empty` is empty JSON object `{}`.

},

},

],

"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag

5912

# used to tag a quasi-identifiers field must appear in exactly one

5913

# field of one auxiliary table.

5914

{ # An auxiliary table containing statistical information on the relative

5915

# frequency of different quasi-identifiers values. It has one or several

5916

# quasi-identifiers columns, and one column that indicates the relative

5917

# frequency of each quasi-identifier tuple.

5918

# If a tuple is present in the data but not in the auxiliary table, the

5919

# corresponding relative frequency is assumed to be zero (and thus, the

5920

# tuple is highly reidentifiable).

5921

"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number

5922

# between 0 and 1 (inclusive). Null values are assumed to be zero.

5923

# [required]

5924

"name": "A String", # Name describing the field.

5925

},

5926

"quasiIds": [ # Quasi-identifier columns. [required]

5927

{ # A quasi-identifier column has a custom_tag, used to know which column

5928

# in the data corresponds to which column in the statistical model.

5929

"field": { # General identifier of a data field in a storage service.

5930

"name": "A String", # Name describing the field.

5931

},

5932

"customTag": "A String",

5933

},

5934

],

5935

"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]

5936

# identified by its project_id, dataset_id, and table_name. Within a query

5937

# a table is often referenced with a string in the format of:

5938

# `<project_id>:<dataset_id>.<table_id>` or

5939

# `<project_id>.<dataset_id>.<table_id>`.

5940

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

5941

# If omitted, project ID is inferred from the API call.

5942

"tableId": "A String", # Name of the table.

5943

"datasetId": "A String", # Dataset ID of the table.

},

},

],

},

"categoricalStatsConfig": { # Compute numerical stats over an individual column, including

5949

# number of distinct values and value count distribution.

5950

"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are

5951

# supported except for arrays and structs. However, it may be more

5952

# informative to use NumericalStats when the field type is supported,

5953

# depending on the data.

5954

"name": "A String", # Name describing the field.

5955

},

5956

},

5957

"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.

5958

"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a

5959

# single individual. If the same entity_id is associated to multiple

5960

# quasi-identifier tuples over distinct rows, we consider the entire

5961

# collection of tuples as the composite quasi-identifier. This collection

5962

# is a multiset: the order in which the different tuples appear in the

5963

# dataset is ignored, but their frequency is taken into account.

5964

#

5965

# Important note: a maximum of 1000 rows can be associated to a single

5966

# entity ID. If more rows are associated with the same entity ID, some

5967

# might be ignored.

5968

# single person. For example, in medical records the `EntityId` might be a

5969

# patient identifier, or for financial records it might be an account

5970

# identifier. This message is used when generalizations or analysis must take

5971

# into account that multiple rows correspond to the same entity.

5972

"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.

5973

"name": "A String", # Name describing the field.

5974

},

5975

},

5976

"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are

5977

# specified, they are considered a single composite key. Structs and

5978

# repeated data types are not supported; however, nested fields are

5979

# supported so long as they are not structs themselves or nested within

5980

# a repeated field.

5981

{ # General identifier of a data field in a storage service.

5982

"name": "A String", # Name describing the field.

},

],

},

},

"categoricalStatsResult": { # Result of the categorical stats computation.

5988

"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.

5989

{

5990

"bucketValues": [ # Sample of value frequencies in this bucket. The total number of

5991

# values returned per bucket is capped at 20.

5992

{ # A value of a field, including its frequency.

5993

"count": "A String", # How many times the value is contained in the field.

5994

"value": { # Set of primitive values supported by the system. # A value contained in the field in question.

5995

# Note that for the purposes of inspection or transformation, the number

5996

# of bytes considered to comprise a 'Value' is based on its representation

5997

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

5998

# 123456789, the number of bytes would be counted as 9, even though an

5999

# int64 only holds up to 8 bytes of data.

6000

"floatValue": 3.14,

6001

"timestampValue": "A String",

6002

"dayOfWeekValue": "A String",

6003

"timeValue": { # Represents a time of day. The date and time zone are either not significant

6004

# or are specified elsewhere. An API may choose to allow leap seconds. Related

6005

# types are google.type.Date and `google.protobuf.Timestamp`.

6006

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

6007

# to allow the value "24:00:00" for scenarios like business closing time.

6008

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

6009

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

6010

# allow the value 60 if it allows leap-seconds.

6011

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

6012

},

6013

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

6014

# and time zone are either specified elsewhere or are not significant. The date

6015

# is relative to the Proleptic Gregorian Calendar. This can represent:

6016

#

6017

# * A full date, with non-zero year, month and day values

6018

# * A month and day value, with a zero year, e.g. an anniversary

6019

# * A year on its own, with zero month and day values

6020

# * A year and month value, with a zero day, e.g. a credit card expiration date

6021

#

6022

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

6023

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

6024

# a year.

6025

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

6026

# if specifying a year by itself or a year and month where the day is not

6027

# significant.

6028

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

6029

# month and day.

6030

},

6031

"stringValue": "A String",

6032

"booleanValue": True or False,

6033

"integerValue": "A String",

},

},

],

"bucketValueCount": "A String", # Total number of distinct values in this bucket.

6038

"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.

6039

"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.

6040

"bucketSize": "A String", # Total number of values in this bucket.

},

],

},

"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an

6045

# estimation, not exact values.

6046

"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a

6047

# value doesn't correspond to any such interval, the associated frequency

6048

# is zero. For example, the following records:

6049

# {min_probability: 0, max_probability: 0.1, frequency: 17}

6050

# {min_probability: 0.2, max_probability: 0.3, frequency: 42}

6051

# {min_probability: 0.3, max_probability: 0.4, frequency: 99}

6052

# mean that there are no record with an estimated probability in [0.1, 0.2)

6053

# nor larger or equal to 0.4.

6054

{ # A DeltaPresenceEstimationHistogramBucket message with the following

6055

# values:

6056

# min_probability: 0.1

6057

# max_probability: 0.2

6058

# frequency: 42

6059

# means that there are 42 records for which δ is in [0.1, 0.2). An

6060

# important particular case is when min_probability = max_probability = 1:

6061

# then, every individual who shares this quasi-identifier combination is in

6062

# the dataset.

6063

"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total

6064

# number of classes returned per bucket is capped at 20.

6065

{ # A tuple of values for the quasi-identifier columns.

6066

"quasiIdsValues": [ # The quasi-identifier values.

6067

{ # Set of primitive values supported by the system.

6068

# Note that for the purposes of inspection or transformation, the number

6069

# of bytes considered to comprise a 'Value' is based on its representation

6070

# as a UTF-8 encoded string. For example, if 'integer_value' is set to

6071

# 123456789, the number of bytes would be counted as 9, even though an

6072

# int64 only holds up to 8 bytes of data.

6073

"floatValue": 3.14,

6074

"timestampValue": "A String",

6075

"dayOfWeekValue": "A String",

6076

"timeValue": { # Represents a time of day. The date and time zone are either not significant

6077

# or are specified elsewhere. An API may choose to allow leap seconds. Related

6078

# types are google.type.Date and `google.protobuf.Timestamp`.

6079

"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose

6080

# to allow the value "24:00:00" for scenarios like business closing time.

6081

"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.

6082

"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may

6083

# allow the value 60 if it allows leap-seconds.

6084

"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.

6085

},

6086

"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day

6087

# and time zone are either specified elsewhere or are not significant. The date

6088

# is relative to the Proleptic Gregorian Calendar. This can represent:

6089

#

6090

# * A full date, with non-zero year, month and day values

6091

# * A month and day value, with a zero year, e.g. an anniversary

6092

# * A year on its own, with zero month and day values

6093

# * A year and month value, with a zero day, e.g. a credit card expiration date

6094

#

6095

# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.

6096

"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without

6097

# a year.

6098

"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0

6099

# if specifying a year by itself or a year and month where the day is not

6100

# significant.

6101

"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a

6102

# month and day.

6103

},

6104

"stringValue": "A String",

6105

"booleanValue": True or False,

6106

"integerValue": "A String",

6107

},

6108

],

6109

"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these

6110

# quasi-identifier values is in the dataset. This value, typically called

6111

# δ, is the ratio between the number of records in the dataset with these

6112

# quasi-identifier values, and the total number of individuals (inside

6113

# *and* outside the dataset) with these quasi-identifier values.

6114

# For example, if there are 15 individuals in the dataset who share the

6115

# same quasi-identifier values, and an estimated 100 people in the entire

6116

# population with these values, then δ is 0.15.

6117

},

6118

],

6119

"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.

6120

"bucketSize": "A String", # Number of records within these probability bounds.

6121

"maxProbability": 3.14, # Always greater than or equal to min_probability.

6122

"minProbability": 3.14, # Between 0 and 1.

},

],

},

"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.

6127

# identified by its project_id, dataset_id, and table_name. Within a query

6128

# a table is often referenced with a string in the format of:

6129

# `<project_id>:<dataset_id>.<table_id>` or

6130

# `<project_id>.<dataset_id>.<table_id>`.

6131

"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.

6132

# If omitted, project ID is inferred from the API call.

6133

"tableId": "A String", # Name of the table.

6134

"datasetId": "A String", # Dataset ID of the table.

6135

},

6136

},

6137

"state": "A String", # State of a job.

6138

"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that

6139

# instantiated the job.

6140

"startTime": "A String", # Time when the job started.

6141

"endTime": "A String", # Time when the job finished.

6142

"type": "A String", # The type of job.

6143

"createTime": "A String", # Time when the job was created.

},

],

}</pre>

</div>

<code class="details" id="list_next">list_next(previous_request, previous_response)</code>

6151

<pre>Retrieves the next page of results.

6152

6153

Args:

6154

previous_request: The request for the previous page. (required)

6155

previous_response: The response from the request for the previous page. (required)

6156

6157

Returns:

6158

A request object that you can call 'execute()' on to request the next

6159

page. Returns None if there are no more items in the collection.

</pre>

</div>

</body></html>