Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the path to
the Service Account key file that you downloaded when you created the
Service Account. For example:
export GOOGLE_APPLICATION_CREDENTIALS=key-file
Give your new Service Account the AutoML Editor IAM role with the following
commands:
replacing YOUR_PROJECT_ID with your GCP project ID.
Create a Google Cloud Storage bucket to store the documents that you will
use to train your custom model. The bucket name must be in the format:
YOUR_PROJECT_ID-lcm. Runy the following command to create a bucket in the
us-central1 region:
replacing YOUR_PROJECT_ID with your GCP project ID.
Usage
Run labelcat --help for usage information.
labelcat <command>
Commands:
labelcat retrieveIssues <repoDataFilePath> Retrieves issues from a .txt file of gitHub
<issuesDataFilePath> <label> repositories. Options: -a
labelcat createDataset <datasetName> Create a new Google AutoML NL dataset with the specified
name. Options: -m
labelcat importData <issuesDataPath> <datasetId> Import the GitHub issues data from Google Cloud Storage
bucket into the Google AutoML NL dataset by specifying
the file's path in the bucket and the dataset ID.
Options:
--version Show version number [boolean]
--help Show help [boolean]
Examples:
labelcat retrieveIssues repoData.txt issuesData.csv 'type: Retrieves issues with matching labels from list of repos
bug' -a 'bug' -a 'bugger' in repoData.txt and saves the resulting information to
issuesData.csv.
labelcat createDataset Data Creates a new multilabel dataset with the specified
name.
labelcat importData gs://myproject/mytraindata.csv Imports the GitHub issues data into the dataset by
1248102981 specifying the file of issues data and the dataset ID.
Retrieve Issues
Create a repos.txt file with a single column list of GitHub repositories from
which to collect issue data. The format should be :owner/:repository:
From the project folder, run the retrieveIssues command with the path of the
repository list file, path to a location to save the resulting .csv file, desired issue label, and optional alternative issue labels:
Example:
labelcat retrieveIssues repos.txt issues.csv "type: bug" -a "bug"
Upload the resulting .csv file to your Google Cloud Storage Bucket:
Example:
gsutil cp repos.txt gs://YOUR_PROJECT_ID-lcm/
replacing YOUR_PROJECT_ID with your GCP project ID.
Create Dataset
From the project folder, run the createDataset command with the name of the
dataset to create.
Example:
labelcat createDataset TestData
List Datasets
Run listDataset to return a list of all AutoML NL datasets for the Google Cloud Platform project.
Example:
labelcat listDatasets
Import Data
Run importData using the Dataset ID returned by the createDataset command
and the URI to the issue data .csv file.
请发表评论