***Imply Version: 2.7.5 and higher***
With the added ability to perform UNION ALL operations on datasets, Imply also offers the ability to create a data cube from multiple datasets.
Before we begin, there are a couple of things to note about this capability.
- This is a logical operation at query time, so no changes are made to the underlying druid datasets
- This is a UNION ALL operation, so duplicates between datasets will be preserved
- Any dimensions that are missing from a dataset will be treated as NULL values
To create a data cube from multiple data sources, add the following JSON to the Data Cube Options field in the Advanced section of the Data Cube Properties. With TABLE_A and TABLE_B being the names of the data sources you wish to use.
Getting Started - Loading the data
For this example, I will be using sample data from https://quantquote.com/historical-stock-data.
First, let's create two datasets, for my example, I am using Minute and Second resolution data provided as a sample from Quantquote. The datasets are available here:
The data format is CSV, with a header that contains the column names.
You'll notice that the Minute level data contains a date field, while the Second level data does not.
For these datasets, we will not be using a time column or a roll-up
Leaving the defaults for the columns
And we will go ahead and Automatically Create a data cube for each of the data sources, just for comparison purposes.
A quick review and we go ahead and load the data!
Create the Data Cube
We are going to create a new data cube, and initially select the data source with all of the fields you wish to include in the cube (in this case we start with the By Minute data source, as it contains the date field that does not exist in the Second level dataset)
Go ahead and change the name, so we don't confuse it with our existing data cubes.
Jump down to the Advance Tab and scroll to the bottom to the Data Cube Options and enter the following JSON.
Click Save and We're DONE
Spy Minute Trade = 748 Events
Spy Second Trade = 19.15k Events