<img src="//bat.bing.com/action/0?ti=5739614&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">
Data Innovation

Data Preparation Using Amazon Quicksight to Deliver Business Solutions

With these data preparation features offered by Amazon QuickSight, the data preparation process becomes faster, more reliable and easy to set up.

Amazon QuickSight is a cloud-powered business intelligence service that helps businesses discover insights through create interactive visualizations and dashboards that help answer data-based questions. QuickSight allows you to connect data from different data sources, and is a competitor of other BI tools in the market such as Tableau, Looker, and Power BI. In this blog post, I will go over a few key data preparation features offered by QuickSight that make it a powerful tool for delivering data-backed business solutions. 

Using SPICE to prepare data

AWS QuickSight offers various features to prepare data, one of the best being SPICE. SPICE stands for Superfast Parallel In-memory Calculation Engine. It provides fast query performance and automatically scales based on your activities. With SPICE, you can protect your underlying data sources and save both time and cost because you don’t need to hit the data source (database or data warehouse) to retrieve the data every time you change something in the analysis. This feature helps to load and update the analysis super-fast.

SPICE can store up to 500 million rows, or 500 GB in size. For updating or refreshing the data in SPICE, you can schedule a refresh by selecting the refresh now option (shown in Figure 1), or by choosing the refresh now option while editing the query (Figure 2).  

Fig 1

(Figure 1)

Fig 2

(Figure 2)

QuickSight also supports incrementally loading new data to the SPICE datasets instead of needing to update the entire dataset. Using Incremental Refresh, you can update the SPICE dataset in a fraction of the time that it takes to perform a full refresh. After you select a refresh option ('Refresh now' or 'Schedule a refresh'), it will ask for refresh types: Full refresh or Incremental Refresh (Figure 3). When you select Incremental Refresh, you need to configure the look back window (example: 1 day, 1 week, 8 hours) in which new rows can be found, including added, modified and deleted rows.  

 Fig 3(Figure 3)  

Let's look at an example to understand this process. Say you have a chat session dataset from January 1st to March 31st consisting of 18,000 records (200 records per day) and today is the 1st of April. You can refresh the data by selecting Incremental Refresh with a look back window of 5 days, and QuickSight will query the dataset asking for the last 5 days of data (i.e., since March 27th), which would give 1,000 records. The next day (April 2nd) it will do the same but querying from March 28th (also 1,000 records). It is the best and fastest way to ingest data every day, as you just have to process 1,000 records a day instead of  the full 18,000 records. With SPICE and the support of Incremental Refresh, QuickSight helps you to create datasets with faster and more reliable refreshes.   


Once you pull the data into QuickSight, you need to verify access to the data because QuickSight users in your organization should only have access to a certain amount of data for security reasons. You can use row level and column level security to enforce security on the data which is integrated in QuickSight (row level and column level security is only offered in the Enterprise edition).  

Row level and column level security help to restrict sensitive data from a specific group of users and who can only view the data that they have permission to in the dashboard. Say, for example, that you have data for multiple clients and wanted to create a dashboard to show the client performance for the quarter and share it with all the clients. Here, you just need one dashboard with client performance metrics because you can make sure clients can view only their data. Another example is perhaps you want to restrict access based on some field, like location. Let us assume you have data which can only be shared in specific regions. In this instance, you would need to enforce permission based on location for that region’s users.   

Let us go through the process of setting up row level and column level security in the dataset you stored in the QuickSight. When you go to the dataset and click on it, it will show an option of column level and row level security as shown in figure 4.   

Fig 4(Figure 4)  

When you select row level security, you will see a pop up asking for a dataset rules permission file or data where you can upload the permission file with proper user access before selecting the row level security. While creating the permission file, make sure it is case sensitive, as they must exactly match the field names and values. The order of the fields does not matter, but you need at least one field that can identify users or groups (the field you use for users or group can be named as per your choice). After you load the permission file into QuickSight, you need to make sure all the rows you want to filter based on the user are accurately visible (shown in Figure 6). The screenshot in figure 5 shows the dataset details. We will use the city and branch fields to restrict user access to data in this example. The screenshot in Figure 6 shows the dataset rules (permission) for row level security. By default, if you don’t add a user in the permission file access would be denied for that user. If you put an empty string next to the user, they will see everything.   

After you select row level security for the dataset, select the appropriate permission file (for this post it’s permission.csv). Once you press 'Apply dataset', row level security will be enforced. For testing, when we login as a danialcharles user, this user would only have access to city Mandalay and branch A and B.  

Fig 5(Figure 5)  


Fig 6 (Figure 6) 

For column level access, it’s a straightforward process when you click on column level security as shown in Figure 4. By default, all the users and groups will have access to the data as shown in Figure 7 below. You need to select the column and add the user in order to restrict who has access to view a specific column. For example, in our current dataset of supermarkets shown in Figure 7, we want to restrict the gender column to just a specific group. For that, we need to select the gender column, and in the next step add the user who we want to have access to that column. Figure 8 shows the gender column restricted to just the business_intelligence group.  

 Fig 7(Figure 7)  

 Fig 8
(Figure 8)  

In the analysis or dashboard, any restricted dimension or measure field will have a lock symbol next to it. While fetching in tables or viewing those values in the table, a restricted user would see that value as “Not Authorized.”   

Calculated Fields 

After fetching and restricting the data, you can build an analysis in Quicksight. Quicksight can transform, filter, and aggregate raw data into meaningful insights to answer business-related questions. But to perform advance analysis, you can use calculated fields. Quicksight offers unique features to create different calculations before you start visualizing or finding insights from it so that you can get more detail about the data. These calculation options include:  

  • Scalar functions – Returns a single value for each row of input data, eg: addition, division, etc.  

  • Aggregation functions – Aggregates the data based on a certain group and returns a single summarized value for that specific group of rows. Eg: sum(), avg(), etc.  

  • Table functions – Works over a certain collection of rows and returns a result in a collection of rows eg: rank(), sumover(), countover().  

  • Level-aware aggregation – A table function that is evaluated before aggregation or after filtering.  

Scalar functions are used when you perform a simple calculation on each row or combine two columns to perform the calculation. For example, when you want to calculate profit and you have a revenue and cost column, we can just use Revenue – Cost to get Profit calculated for each row. Aggregation functions are used for combining or finding an aggregated value based on certain rows (E.g., total profit by each region). Table functions are almost identical analytical or window functions in SQL, just written in a different way. For example, if we want to find a running sum, we use sumover() function and if we want to check the top 10 transactions for each account, we use rankover() to get the ranking of each transaction based on accounts. 

Level-aware aggregation is something Quicksight introduced to meet more analytical requirements. In Quicksight, regular table functions can only be calculated after the aggregation and filter stage, but with Level-aware aggregation (LAA), you can calculate a group of functions before aggregations and filters. The below image (Figure 9) from the AWS website shows how LAA works:  

 order-of-evaluation_  (Figure 9, source) 

  • PRE_FILTER – before applying filters, QuickSight evaluates prefilter calculations then it will apply any filters that are configured on these prefilter calculation fields. E.g., Using PRE_FILTER in countover function like countover({member}, [{state}, {category}], PRE_FILTER) will calculate the count of members in each state by category before applying the filter  
  • PRE_AGG – before computing display-level aggregations, QuickSight performs pre-aggregated calculations, then it will apply any filters configured on these pre_agg calculation fields. This calculation is also performed before top and bottom N filters. E.g., Using PRE_AGG in countover function like countover({member}, [{state}], PRE_AGG) will execute the count of members at the pre-aggregate level   


With these data preparation features offered by Amazon QuickSight, the data preparation process becomes faster and more reliable, easy to set up, with great security functionality to restrict and protect the data and build powerful and complicated data analytics which helps to find meaningful insights from the data and answer business questions.