vignettes/automl_tabluar_classification_batch.Rmd
automl_tabluar_classification_batch.Rmd
Run the following chunk to install googleCloudVertexAIR
and the other required R packages to complete this tutorial (checking to
see if they are installed first and only install if not already):
required_packages <- c("remotes", "googleAuthR")
missing_packages <- required_packages[!(required_packages %in%
installed.packages()[,"Package"])]
if(length(missing_packages)) install.packages(missing_packages)
# remotes::install_github("justinjm/googleCloudVertexAIR") # run first time
Create a file called .Renviron
in your project’s working
directory and use the following environemtn arguments:
GAR_SERVICE_JSON
- path to service account (JSON)
keyfile downloaded before and copiedGCVA_DEFAULT_PROJECT_ID
- string of your GCP project
you configured beforeGCVA_DEFAULT_REGION
- region of GCP resorces that can
be one of: "us-central1"
or "eu"
e.g. your .Renviron
should look like:
# .Renviron
GAR_SERVICE_JSON="/Users/me/auth/auth.json"
GCVA_DEFAULT_PROJECT_ID="my-project"
GCVA_DEFAULT_REGION="us-central1"
TODO: https://github.com/justinjm/googleCloudVertexAIR/issues/26
library(googleAuthR)
library(googleCloudVertexAIR)
options(googleAuthR.scopes.selected = "https://www.googleapis.com/auth/cloud-platform")
gar_auth_service(json_file = Sys.getenv("GAR_SERVICE_JSON"))
projectId <- Sys.getenv("GCVA_DEFAULT_PROJECT_ID")
gcva_region_set(region = "us-central1")
## 2024-07-08 12:34:54.313795> Region set to 'us-central1'
gcva_project_set(projectId = projectId)
## 2024-07-08 12:34:54.314338> ProjectId set to 'gc-vertex-ai-r'
## [1] "20240708123454"
datasetDisplayName <- sprintf("california-housing-%s", timestamp)
datasetDisplayName
## [1] "california-housing-20240708123454"
Source dataset:
gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv
dataset <- gcva_create_tabluar_dataset(
displayName = datasetDisplayName,
gcsSource = "gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv")
## 2024-07-08 12:34:56.992065> Waiting 2 seconds...
dataset
## ==Google Cloud Vertex AI Dataset==
## name: projects/442003009360/locations/us-central1/datasets/5218729686657400832
## displayName: california-housing-20240708123454
## createTime: 2024-07-08 16:34:55
## gcsSource: gs://cloud-samples-data/ai-platform-unified/datasets/tabular/california-housing-tabular-regression.csv
job <- gcva_automl_tabluar_training_job(
displayName = sprintf("california-housing-%s", timestamp),
optimizationPredictionType = "regression",
column_transformations = list(
list(numeric = list(column_name = "longitude")),
list(numeric = list(column_name = "latitude")),
list(numeric = list(column_name = "housing_median_age")),
list(numeric = list(column_name = "total_rooms")),
list(numeric = list(column_name = "total_bedrooms")),
list(numeric = list(column_name = "population")),
list(numeric = list(column_name = "households")),
list(numeric = list(column_name = "median_income"))
)
)
model <- gcva_run_job(
job = job,
dataset = dataset,
targetColumn = "median_house_value",
modelDisplayName = sprintf("model-%s", datasetDisplayName))
model
california_housing
california_housing.source_data
batch02
from BQ table
california_housing.source_data
# hard code #modelName for testing purposes, model state = completed
# model <- Sys.getenv("GCVA_TEST_MODEL_NAME_AUTOML")
batch_prediction_job <- gcva_batch_predict(
jobDisplayName = sprintf("california-housing-%s", timestamp),
model = model,
bigquerySource= bq_source_uri,
instancesFormat = "bigquery",
predictionsFormat = "bigquery",
bigqueryDestinationPrefix = bq_destination_prefix
)
batch_prediction_job
Once the batch prediction job has completed, you can then view and use the predictions
Open BigQuery console and navigate to the dataset where the predictions were saved, then modify and run the query below:
SELECT
predicted_TARGET_COLUMN_NAME.value,
predicted_TARGET_COLUMN_NAME.lower_bound,
predicted_TARGET_COLUMN_NAME.upper_bound
FROM BQ_DATASET_NAME.BQ_PREDICTIONS_TABLE_NAME
See more details here: https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions#retrieve-batch-results
predictions_TIMESTAMP
gcva_delete_dataset(dataset = dataset)
## 2024-07-08 12:34:57.629905> Dataset successfully deleted.