Objective:
Step by step guide to configure Druid to ingest files from Google Cloud Platform.
Step 1: Login to GCP console:
Go to GCP console page https://console.cloud.google.com/, and log in with your credential.
Step 2: Create a service account that will access the bucket:
Browse to "IAM & Admin" -> "Service Accounts" -> "Create Service Account", Create a service account, in this example "imply-cs"
At permission section of the "imply-cs", make sure your user name is listed as one of the owners
Click on "Create key"
Download the JSON file that created, and save for later use. That's the authentication/authorization credential for your Druid hosts to access GCP.
Step 3: Apply the bucket access permission to the service account just created:
Browse to your GCP bucket through Google navigation menu -> storage -> browser -> YOUR BUCKET ("cs-bucket1" in this example)
Configure user access permission on the bucket
Add the service account as Storage Admin to this bucket.
Step 4: Configure environmental variable on all Druid hosts
Open SSH sessions to all druid hosts, and save the JSON file downloaded in step 2 to Imply Druid's conf (./conf) directory on all hosts. Then in your bash properties file, e.g, '~/.bash_properties' on RHEL/CentOS, create a new environment variable GOOGLE_APPLICATION_CREDENTIALS. Replace [PATH] with the file path of the JSON file that contains your service account key, and [FILE_NAME] with the filename. For example:
[root@ip-172-31-2-115 ec2-user]# cat ~/.bash_profile # .bash_profile export GOOGLE_APPLICATION_CREDENTIALS="/imply-<VERSION>/conf/googleauth.json"
[root@ip-172-31-2-115 ec2-user]# cat /imply-<VERSION>/conf/googleauth.json { "type": "service_account", "project_id": "<ID>", "private_key_id": "<KEY_ID>", "private_key": "-----BEGIN PRIVATE KEY-----<KEY>-----END PRIVATE KEY-----\n", "client_email": "imply-cs@<ID>.iam.gserviceaccount.com", "client_id": "<CLIENT_ID>", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "<URL>" }
Step 5: Add "Druid-google-extensions" to the Druid's "druid.extensions.loadList" in the Druid's "common.running.properties".
Step 6: Add following to replace the current deep storage config in "common.running.properties"
druid.storage.type=google druid.google.bucket=cs-bucket1 druid.google.prefix=druid/segments
Step 7: Add following to replace the current index logging config in "common.running.properties"
druid.indexer.logs.type=google druid.indexer.logs.bucket=cs-bucket1 druid.indexer.logs.prefix=druid/indexing-logs
Step 8: restart all Druid services.
Step 9: Upload raw data file "wikipedia-2016-06-27-sampled.json" to GCP bucket "cs-bucket1".
Step 10: POST native batch ingestion to start ingestion:
{ "type": "index", "spec": { "dataSchema": { "dataSource": "wikipedia", "parser": { "type": "string", "parseSpec": { "format": "json", "dimensionsSpec": { "dimensions": [ "isRobot", "diffUrl", { "name": "added", "type": "long" }, "channel", "flags", { "name": "delta", "type": "long" }, "isUnpatrolled", "isNew", { "name": "deltaBucket", "type": "long" }, "isMinor", "isAnonymous", { "name": "deleted", "type": "long" }, "namespace", "comment", "page", { "name": "commentLength", "type": "long" }, "user", "countryIsoCode", "regionName", "cityName", "countryName", "regionIsoCode", { "name": "metroCode", "type": "long" } ] }, "timestampSpec": { "column": "timestamp", "format": "iso" } } }, "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "rollup": false, "queryGranularity": "none" }, "metricsSpec": [] }, "ioConfig": { "type": "index", "firehose": { "type":"static-google-blobstore", "blobs”: [ { "bucket":"cs-bucket1", "path":"wikipedia-2016-06-27-sampled.json" } ] }, "appendToExisting": false }, "tuningConfig": { "type": "index", "forceExtendableShardSpecs": true, "maxRowsInMemory": 1000000, "reportParseExceptions": false, "maxParseExceptions": 100, "maxSavedParseExceptions": 10 } } }
Comments
0 comments
Please sign in to leave a comment.