Data Preview options
Previewing data inside Model nodes
When modelling and transforming data it's essential to see the output of the transformations you are performing on your initial, source, tables. At Y42 we offer innovative ways for previewing your outputted data - offering even quick statistical information on the output table!
We are aware that users have different use cases and needs when modelling and performing data exploration in Y42. That's why we offer three different possibilities when previewing your data: sampled preview, full quick preview, and full preview.
The sampled preview uses a sample input data with 1.000 rows and allows you to model on that sample data, always creating an intermediate table in between. This translates into high performance so you can navigate through your nodes output very quickly. Meanwhile, a full quick preview takes all the data of each connected input node and executes a full query once. At the end of this query, we limit the result by 1.000 rows. Hence, we process all data but only return the first 1.000 rows. At last, there is the full preview, also named full materialized preview. In this case, Y42 is actually creating short lived intermediate tables along the paths, specially at places where the calculation is heavy, such as JOIN or PARTITION. This preview option pre-calculates an intermediate table that will be stored for only 1 hour inside your data warehouse.
But what does that all mean to you? Here is a summary of each preview option implications:
Sampled preview (1.000 rows Input)
- We limit 1.000 rows in the input
- Full preview functionality:
- Descriptive Statistical information
- Option to filter data
- Highest performance
- Limitations:
- Since all transformations are done over a sample with 1.000 rows, expected values can be missing from the output previews, specially after joins or aggregations.
- True row count of the table is not displayed
Quick Preview (Full Input)
- Complete table is used: all data from input nodes are used
- Output preview will display 1.000 rows (similar to a LIMIT 1000 statement on a SQL query)
- High Performance: Extremely fast since we only execute 1x query
- Limitations:
- No Data Distribution: We cannot show the Descriptive Statistical information on top of each column for the complete data.
- Filter: You can only filter in the frontend based on the 1000 rows.
- Query Limit: If you have a UI-Model that has many JOINs and UNIONs and/or PARTITIONs, even a data warehouse like BigQuery will run into it's limit and will throw a query limit exception.
Materialized Preview (Full Input)
- Complete table is used: all data from input nodes are used
- Complete outputted table is displayed on preview
-
No Query Limit: Since Intermediate tables are created along the way, we can select from the previously materialised intermediate table and run only the distance between the last materialised output table to the current preview table.
- Full Filter: It is possible to filter on the full data because the filter query is executed on top of the materialized table.
- Full Data Distribution: Since the complete table is materialized.
- Limitations:
- Low Performance: Performance is highly affected due to the need to build a lot of intermediate tables along the way.