Supported add-ons for this feature: Gemini Enterprise, Gemini Education Premium, and AI Security. Compare add-ons
The AI classification feature uses artificial intelligence (AI) to automatically label your organization’s sensitive content. After an initial training period, during which the AI model learns your organization's criteria for sensitive content, AI classification can automatically apply labels to both new and existing files in Google Drive.
Here's how to get started using AI classification:
1) Set up training: To get started, you create the classification label which the AI model will automatically apply to files once training is done. You also create the training label—a label that's nearly identical to the classification label.
2) Train the model: During the training period, typically about a week, your designated labelers—users at your organization who can evaluate sensitive files—begin classifying Drive files with the training label. From their examples, your model begins learning how to similarly classify sensitive files.
3) Turn on automatic classification: Once the model is trained (after about a week), you're prompted to turn on automatic classification. You can monitor how many files are classified, and how accurately, on an ongoing basis.
For exact details on each phase, go to the linked sections below.
Before you begin
- If you’re not familiar with Drive labels, go to Get started as a Drive labels admin for details on how they work and how to create them.
- For best results, create a configuration group for your designated labelers that's separate from the rest of your organization. For instructions, go to Customize service settings with configuration groups.
Set up training
Create the classification label
The classification label is the label the AI model will automatically apply to your sensitive Drive files after the model is trained. The model will be trained on and use only one field per label. The AI-set field must be either a badged or option list field type. For more information about labels, go to Get started as a Drive labels admin.
When used as a classification label, an option list or badged field must meet these requirements:
- Have at least 2 and no more than 7 options
- Must be published
If you have an existing label that meets these requirements, you can use it as a classification label. Otherwise, create a label.
Create the training label
We recommend that you create the training label during label selection (next step), when you can create it automatically. This guarantees the training label will match the classification label in all the required ways.
If you choose to create the training label before label selection:
- Make sure the label meets the required label criteria.
- Identify the training label with the word "training" to make it easier for your trusted labelers to recognize the label and apply it during the training period.
- Add a description field to the training label to further help trusted labelers understand its purpose.
Select labels and enable training
-
Sign in to your Google Admin console.
Sign in using your administrator account (does not end in @gmail.com).
-
In the Admin console, go to Menu SecurityAccess and data controlLabel manager.
- In AI classification for Google Drive, click Set up training.
- For Select classification label, click Select Label.
- Select the label you want AI classification to use and the field it will set.
- For Select training label, click Create training label.
This automatically creates a training label with the same attributes as your Classification label.
- To make sure the new label is available to your designated labelers, click Update label permissions. This opens the label in Edit mode in label manager in a separate tab.
Note: You can also set label permissions later. But it’s important that only your labelers have access to the training label.
- Click PermissionsEdit, then grant the Can apply labels and set values permission to the configuration group that contains your labelers.
- Click Save and close the label manager tab.
After selecting both the classification label and training label, the Enable training button is enabled.
- Click Enable training.
Important: If you get an error message when you try to enable training, it means your classification label and training label don’t match. Review the label requirements below and make sure your labels meet all requirements, then enable training.
After you enable training, the Data classification page shows your selected Training label and Classification label.
- The Classification label shows Not ready. After training is done, the label status changes to Ready.
- Auto apply status shows Off for everyone. Once the Classification label status is Ready, you can then change the Auto apply status to On.
Next, your designated labelers need to start applying the Training label to your sensitive files.
Train the model
To successfully train the AI model, your designated labelers should label at least 100 files per option. For example, if your label has 3 options, it should be applied to at least 300 files in total. The AI model checks training every 1–2 weeks and shows Ready once it has 100 or more examples for each label option. Learn more about high-quality examples.
During the training period, you can check progress for how many files have been labeled and how the accuracy of the model is improving.
Note: Training files have a 1 million total limit.
To check progress during the training period:
- In your Admin console, go to SecurityData classification.
- Click View model details.
- For Training label, Training files shows the number of files that have been labeled for each option.
- Each label option has a Score that shows the percentage of training examples the model classified correctly after testing itself.
- Low— Below 50%. The model needs better data and isn’t ready yet.
- Medium—50-80%. The model may be ready on a limited basis.
- High—Above 80%. The model is ready to classify files for your organization.
Turn on the auto-apply of labels
After the AI model is trained to achieve a high level of accuracy, you’re ready to choose label options and turn on the auto-applying of labels. Follow these steps:
- In your Admin console, go to SecurityData classification.
- In AI Classification, verify that the Classification label shows a status of Ready.
- Click View model details.
- For Classification label, check the boxes for the label options you want to allow the AI model to auto-apply.
- Click Turn on auto-apply.
- Search for and select the organizational unit or group to include those user members to automatically apply labels for. For example, if you select the group "Finance", you can then select the labels to be configured for Finance.
- Click On - Label is auto-applied.
Options for how the label is applied are listed under the On option.
- Click Save.
- On the Data classification main page, the Auto-apply status for the rule changes to On.
When does AI Classification scan files?
AI Classification scans files at rest at least once for users and shared drives that have auto-apply enabled. This process can take 1-2 weeks after auto-apply has first been enabled.
AI Classification also scans files when they are uploaded or modified. The applied label may change based on content changes to the file.
Monitor AI classification label events in the Drive log
You can get specific details on how AI classification is labeling files by looking at events recorded in the Drive log.
- Go to SecurityData classification.
- In AI classification for Google Drive, click View model details.
- Click View logs.
The Security Investigation Tool opens in a new tab, showing search results for the Drive log for two AI Classification-related events: Label applied and Label field value changed.
- Click on the event Description to get additional details, such as:
- Name and type of the document that was labeled
- Label field value assigned to the document (for example, Confidential or Restricted, if those are your label options).
Turn off the auto-apply of labels
You can turn off the auto-apply of all labels, or turn off specific options.
- Go to SecurityData classification.
- In AI classification for Google Drive, click View model details.
- For Classification label, uncheck Allow in the Auto-apply column to pause auto-apply for that option.
- To completely pause auto-apply, uncheck all options.
Use this option if you want to turn auto-apply completely off for content owned by users in specific organizational units or groups.
- Go to SecurityData classification.
- In AI classification for Google Drive, click View model details.
- Click Manage auto-apply.
- Click an organizational unit or group at left to select it.
- In Manage AI auto-apply, click OFF.
Reset the model
At some point, you may need to reset the model (for example, to start another test, or because model accuracy is not improving). If you need to reset the model,note the following:
- If you reset the model, wait for your model to train before AI classification can turn on the new classification label and apply it to the files.
- Previously applied training labels remain on the files. After resetting the model, you can choose to configure a new model to use the same training label (or a different one).
- Automatically applied labels remain on the files after you reset the model.
- If you choose the same classification label for the new model, the AI classification feature ignores and overwrites the predictions of previous models. In this way, you can use the model reset to "reprocess" your organization's Drive files. This can be useful if you made significant improvements to model quality since your initial deployment.
- Go to SecurityData classification.
- In AI classification for Google Drive, click View model details.
- On the AI model details page, for Actions at right, click Reset model.
The Reset model dialog lists the effects of resetting the model.
- To continue, click Reset model.
AI classification is reset to its initial state. To restart, click Set up training and pick new classification and training labels.
FAQ
What are the requirements for the training and classification labels?Both the classification label and the training label must meet the following criteria:
- Contain a minimum of 2, and a maximum of 7 options.
- Have their options in the same order in each label. For example, if the classification label has options in this order;
The training label options can’t be ordered as follows:
- 1. Option 1
- 2. Option 2
- 3. Option 3
- 1. Option 2
- 2. Option 1
- 3. Option 3
- Have labels that are published.
- Have labels with different access permissions. The training label should be available only to designated labelers who can be trusted to train the model. The classification label can have broader access.
For best results in training the model, have your trusted labelers should follow these guidelines when choosing training files:
- Ensure that each file has a minimum of approximately 500 text characters.
- Select files that best represent actual content that your users create, share, and use in your organization
- Select roughly the same number of files per label option, with a minimum of 100 files for each option. This helps the model to gain a comprehensive understanding of your data and improve scores.
- Include a representative variety of files for each option type. For example, don't label 100 resumes as your total set of example files for Top Secret if contracts are also a common Top Secret file type in your organization.
Files that AI classification has previously labeled retain the applied label and option values even after the option is disabled.
Following the creation of the model and the enablement of auto-apply, AI Classification scans and classifies all files at rest for which sufficient text can be extracted. These files are scanned at least once.
AI Classification reprocesses files periodically as content is modified. Content changes may result in a different prediction for a file. When AI Classification has both an old and a new predicted option for a file, it will prefer the option that is higher in the option list. For example, if a field has three options listed in the label manager:
- Confidential
- Internal
- Public
Suppose AI Classification classifies a file as Internal, and the content changes so that the AI Classification model predicts Confidential. In this case, the classification on the file is changed to Confidential. However, if the AI Classification model predicts Public, the classification on the file remains as Internal.
AI Classification does not revise auto-applied labels and field values that have been reviewed or modified by users.
- DLP rule without user overwrite
- Manual classification
- DLP rule with user overwrite
- AI Classification
- Default classification
- Any Drive item can be labeled from Drive. The editor has native labeling UX as well.
- AI Classification leverages the same indexable text processing as Drive DLP. Any file that Drive can extract indexable text from can be evaluated for AI Classification applied labels. It’s not possible to extract indexable text from every file, so it’s not guaranteed that AI Classification can process every file.
- AI Classification requires that a file meets a minimum text threshold before it makes a classification decision. As a result, some files such as very short documents and images with small amounts of text may not get classified.