Confidential matching uses confidential computing to match first-party data from advertisers with Google data to create audience lists. This feature is now available to all users of Google Ads Data Manager for Customer Match, and is now the default technique to create audience lists using Ads Data Manager. Confidential matching also supports creating audience lists from encrypted data as an optional feature, which is not required by Google. If your organization requires you to encrypt data before using it with Customer Match, you can use this guide to learn how to prepare your environment and encrypt your data.
These instructions are provided for customers using Google Cloud. Today, confidential matching supports data encrypted using customer-owned encryption keys managed by Google Cloud Key Management Service (GCP KMS).
In this article:
Process overview
The following is a high-level overview of the steps required to prepare and use encrypted data with Data Manager:
-
Set up your encryption environment. See Set up your environment.
-
Hash and encrypt personally identifiable information (PII) data fields. See Encrypt your data.
-
Upload the encrypted data to a supported data source. See Supported data sources.
-
Connect to your encrypted data source. See Use your encrypted data
Set up your environment
This section prepares your encryption environment. This is a required step that creates a Key Encryption Key (KEK) in Cloud KMS. The KEK is used to encrypt the Data Encryption Key (DEK), and the DEK is used to encrypt the individual values in your dataset. It also sets up a Workload Identity Pool (WIP) provider where attestation conditions are set. You only need to set up an encryption environment once, and you may re-use the environment to encrypt new data every time you refresh your dataset.
|
Objectives:
|
Before you begin
This guide describes the environment setup using gcloud and assumes that you are working in a Linux environment. In order to continue, you’ll need to ensure that gcloud is installed and initialized:
-
Set your project. The gcloud commands in the setup steps that follow will use the project in gcloud’s configuration. To see which project will be used, run the
gcloud config list
command. To update this value, run:gcloud config set project PROJECT_ID
. Alternatively, use the flag--project PROJECT_ID
on the individual commands.
Step 1. Create key resources
1.1 Create a key ring:
gcloud kms keyrings create KEY_RING --location LOCATION
Replace the following:
-
KEY_RING
: the name chosen for the key ring. -
LOCATION
: the Cloud KMS location where the key ring will be created.
Example
gcloud kms keyrings create customer-match-data-encryption-key-ring --location us-central1
To learn more, see Create a key ring in the Google Cloud documentation.
1.2 Create a key:
gcloud kms keys create KEY_NAME --keyring KEY_RING --location LOCATION \
--purpose PURPOSE --rotation-period ROTATION_PERIOD \
--next-rotation-time NEXT_ROTATION_TIME
Replace the following:
-
KEY_NAME
: the name chosen for the key. -
KEY_RING
: the name of the key ring that contains the key. -
LOCATION
: the Cloud KMS location of the key ring. -
PURPOSE
: Must be set to"encryption"
. -
ROTATION_PERIOD
: the interval at which the key will be rotated. -
NEXT_ROTATION_TIME
: the timestamp for the first rotation.
Example
gcloud kms keys create customer-match-data-encryption-key --keyring customer-match-data-encryption-key-ring --location us-central1 --purpose "encryption" --rotation-period 365d --next-rotation-time "$(date --utc --date="next week" +"%Y-%m-%dT%H:%M:%SZ")"
To learn more, see REST Resource: projects.locations.keyRings.cryptoKeys in the Google Cloud documentation.
Step 2. Create workload identity pool resources
2.1 Create the Workload Identity Pool (WIP):
gcloud iam workload-identity-pools create ID --location=LOCATION --display-name=DISPLAY_NAME --description=DESCRIPTION
Replace the following:
-
ID
: the ID chosen for the WIP. -
LOCATION
: the location chosen for the WIP. This must be set toglobal
. See Method: projects.locations.workloadIdentityPools.create. -
DISPLAY_NAME
: the display name chosen for the WIP. This can have the same value asID
. -
DESCRIPTION
: the description chosen for the WIP.
Example
gcloud iam workload-identity-pools create customer-match-wip --location=global --display-name="Customer Match WIP" --description="Customer Match WIP"
To learn more, see Manage workload identity pools and providers and gcloud iam workload-identity-pools create.
2.2 Create the workload identity pool provider:
gcloud iam workload-identity-pools providers create-oidc ID \
--location=LOCATION --workload-identity-pool=WIP_ID \
--display-name=DISPLAY_NAME --description=DESCRIPTION \
--attribute-mapping="google.subject=assertion.sub,google.groups=[\"ID\"]" \
--attribute-condition=ATTRIBUTE_CONDITION \
--issuer-uri="https://confidentialcomputing.googleapis.com"
--allowed-audiences="https://sts.googleapis.com"
Replace the following:
-
ID
: the ID chosen for the WIP provider. Note that this also appears in the--attribute-mapping
flag’s value. -
LOCATION
: the location of the workload identity pool containing this provider. -
WIP_ID
: the ID for the workload identity pool containing this provider. -
DISPLAY_NAME
: the display name chosen for the WIP provider. This can have the same value asID
. -
DESCRIPTION
: the description chosen for the WIP provider. -
ATTRIBUTE_CONDITION
: the attestation condition which verifies that the caller is a confidential match service account.
Example
[email protected]
) is running inside a TEE (CONFIDENTIAL_SPACE
), and the code is audited and signed by Google with a fingerprint (6b1f357b59e9407fb017ca0e3e783b2bd5acbfea6c83dd82971a4150df5b25f9
):gcloud iam workload-identity-pools providers \
create-oidc customer-match-wip-provider \
--location=global \
--workload-identity-pool=customer-match-wip \
--display-name="Customer Match WIP Provider" \
--description="Customer Match WIP Provider" \
--attribute-mapping="google.subject=assertion.submods.container.image_digest,google.groups=[\"customer-match-wip-provider\"]" \
--attribute-condition="assertion.swname == 'CONFIDENTIAL_SPACE' && 'STABLE' in assertion.submods.confidential_space.support_attributes && ['[email protected]'].exists(a, a in assertion.google_service_accounts) && 'ECDSA_P256_SHA256:6b1f357b59e9407fb017ca0e3e783b2bd5acbfea6c83dd82971a4150df5b25f9' in assertion.submods.container.image_signatures.map(sig, sig.signature_algorithm+':'+sig.key_id)" \
--issuer-uri="https://confidentialcomputing.googleapis.com" \
--allowed-audiences="https://sts.googleapis.com"
To learn more, see Manage workload identity pools and providers and gcloud iam workload-identity-pools providers create-oidc in the Google Cloud documentation.
Step 3. Configure key decrypter permission
Use the WIP to configure key decrypter permission:
gcloud kms keys add-iam-policy-binding KEY --keyring KEY_RING --location LOCATION --member MEMBER --role "roles/cloudkms.cryptoKeyDecrypter"
Replace the following:
-
KEY
: the name of the key. -
KEY_RING
: the name of the key ring. -
LOCATION
: the location of the key ring. -
MEMBER
: must be set toprincipalSet://iam.googleapis.com/WIP_NAME/group/PROVIDER_ID
-
WIP_NAME
: the full name of the workload identity pool, formatted as:projects/{PROJECT_NUMBER}/locations/{WIP_LOCATION}/workloadIdentityPools/{WIP_ID}
. -
Find
WIP_NAME
with the command:gcloud iam workload-identity-pools describe {WIP_ID} --location={WIP_LOCATION} --format='value(name)'
. -
PROVIDER_ID
: the ID of the workload identity pool provider.
-
Example
To learn more, see Access control with IAM in the Google Cloud documentation.
Step 4. Optional: Enable audit logs
Setting up audit logs on your project is optional. To enable audit logs on your project, follow the instructions in Enable Data Access audit logs for the following services:
- Identity and Access Management (IAM) API
- Cloud Key Management Service (KMS) API
- Security Token Service API
Step 5. Get WIP provider name
When you use encrypted data to create a Data Manager connection, you will need to enter the WIP provider name.
5.1 Get the WIP provider name:
gcloud iam workload-identity-pools providers describe PROVIDER_ID --workload-identity-pool=POOL_ID --location=LOCATION --format='value(name)'
Replace the following:
-
PROVIDER_ID
: the ID of the WIP provider. -
POOL_ID
: the ID of the WIP. -
LOCATION
: the location of the WIP.
Example input
Example output
5.2 Enter WIP provider name during connection setup:
-
On the Select data screen, expand Encryption keys.
-
Select “This data set contains encrypted data”.
-
Enter the WIP provider name, then click Continue.
Encrypt your data
Data Manager requires personally identifiable information (PII) data to be properly formatted and then encrypted using envelope encryption. To prepare your data, perform the following steps:
-
Format PII fields (email, phone, first name, last name, country code, and zip code) according to these guidelines.
-
Hash the PII fields using the SHA-256 hash function
-
Encode the hashed fields as Base64.
-
Encrypt the Base64-encoded hashed strings using the
XChaCha20Poly1305 DEK
. -
Encode once more as
Base64
.
You may choose the method and programming language to meet these requirements. For reference, we provide a Java example in the next section.
Encryption application example
This example application takes a CSV file as input and produces a formatted and encrypted CSV file which works with Data Manager’s file-based connectors, such as Google Cloud Storage, SFTP, and HTTPS. The application accepts plain text data files then formats, hashes, and encrypts data with an encrypted keyset using Tink version 1.7. For more information and examples, see Create and store an encrypted keyset and EncryptedKeysetExample. If you are using a database or CRM system, you may need to take additional steps to encrypt your data.
Prerequisites
-
The Google Cloud KMS must have a Key Encryption Key (KEK) with an accessible URI. See Manage Keys | Tink.
-
Supply the Google Cloud project credentials using the
-c
flag with a credentials file. If they are not provided, the application attempts to use glcoud default credentials. See Set up Application Default Credentials. -
The input data CSV file must use these header columns:
email
,phone
,first_name
,last_name
,zip_code
, andcountry_code
. -
The credentials used to run this example must have the IAM role “Cloud KMS CryptoKey Encrypter/Decrypter” for the KEK.
1. Download the example application.
2. Get the URI for the KEK in a format that Tink can parse:
echo "gcp-kms://$(gcloud kms keys describe KEY --keyring=KEY_RING --location=LOCATION --format='value(name)')"
Replace the following:
-
KEY
: the name of the key. -
KEY_RING
: the name of the key ring. -
LOCATION
: the location of the key ring.
3. Run the application to encrypt your data:
/usr/bin/java -jar tink-example-application.jar \
-i INPUT_FILE -o OUTPUT_FILE \
-u 'gcp-kms://projects/PROJECT/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY'
Replace the following:
-
-i
: the input CSV file name, including the extension. -
-o
: the output CSV file name, including the extension. -
-u
: the KEK URI.
Example
See the TinkExampleApplicationArguments
class definition in the sample code for a description of the arguments. If there is an error with parsing the arguments, the application will display usage information.
The output file will contain the initial header columns: email
, phone
, first_name
, last_name
, zip_code
, and country_code
; and two new columns: encrypted_dek
, and kek_uri
.
Use your encrypted data
-
Upload the encrypted data to your preferred data source. Upload the output file to a supported file-based data source, such as Google Cloud Storage or SFTP. See Supported data sources for instructions on connecting to a data source.
-
Connect to the data source using Data Manager.
-
On the Select data screen, choose the encrypted file that you uploaded.
-
Expand Encryption keys.
-
Select “This data set contains encrypted data”.
-
Enter the WIP provider name, then click Continue.
-
On the Map fields screen, map the two new fields:
encrypted_dek
, andkek_uri
.
-
See Supported data sources for instructions on connecting to a data source.