Pros and Cons of Using Persistent Derived Tables (PDTs) in Looker

Eric Arsenault
4 min readMay 14, 2023

--

Looker is a powerful analytics platform that allows users to quickly and easily explore data, build complex visualizations and dashboards, and uncover insights. Using LookML, the underlying modeling language of Looker, users can define their data models and create custom views for exploring their data. One of the most powerful feature of LookML is Persistent Derived Tables (PDTs), which allow users to store pre-calculated results in database tables, making it easier to query large datasets. In this blog post, we will take a closer look at PDTs in Looker and discuss the pros and cons of using them within your organization’s data stack.

Pros:

The primary advantage of using PDTs is that they allow for much more efficient data retrieval. By persisting results and storing them in a database table, users are able to access the information much faster than they would with a traditional LookML model. There is no need to perform calculations each time a query is made, making individual queries much faster. This speeds up performance significantly and ensures Looker users can gain access to the data they need quickly and efficiently.

PDTs also allow for better scalability of large datasets, making it easier to query data across multiple sources. This is especially useful if you’re dealing with a high volume of data as PDTs make speedier queries possible without sacrificing accuracy or quality.

PDTs can also be setup with custom datagroups, which can help keep costs down. Instead of relying on a manual or timed trigger, Looker will trigger builds when underlying data sources change. Users also don’t need to setup complicated sensors, as each datagroup takes just one SQL query as its input. This means that users can have confidence in their PDTs without needing to check underlying data sources or worry about timing.

Lastly, PDTs are helpful because they are created in the Looker UI but persists in your database and only work in Looker. This means that you can give your BI analyst team access to build tables while still keeping them behind the safeguard of the Looker UI. You can allow them to model data and persist tables without giving them access to production data pipelines. This enables self serve analytics for the team while still maintaining good security for the whole organization.

Cons:

However, there are some drawbacks to using PDTs in Looker. One of the main drawbacks is the cost associated with setting up and maintaining them. Additional storage space is used in the organizations warehouse to write the tables which costs money. They must also be maintained in the case that the underlying data changes or anything breaks, and can cause a good bit of waste when they are broken. When a PDT is broken it will continue to spin up a failed query every time your datagroup cron checker goes off (default is every 5 mins). PTDs can be expensive if you are not prepared to maintain them.

Another downside is that it requires extra effort to maintain the LookML view. If a new column is added or an existing column is modified, the user must manually update the view LookML in Looker. This means that if you are using a large number of columns or tables, or data that frequently changes, it can be quite time consuming to keep up with all the changes. The risk of overlooking something important increases.

Another downside is that PDTs can not be easily queried outside of Looker. Although this was one of the Pros that was mentioned earlier, it also is a Con in some respects. Since the table names are pre-pended with a string of numbers that change each time the table is re-written, PDTs can not be used with any platform other than Looker. This is very limiting; the organization is paying for a table that can only be used by one of the available data assets.

Additionally, relying on this approach may slow down query performance in some cases as PDTs rely on joins between tables. When querying derived tables, Looker must perform a join on each row of data and then filter the results to return the desired output. This can significantly affect query performance if you are dealing with large datasets or complex queries. PDTs also get very slow and almost un-usable when they are built on top of each other in a cascading fashion similar to a DAG. PDTs are a good tool for quick BI modeling but should not be the singular tool for data modeling at an organization.

Lastly, this approach isn’t suitable for all types of data. If your dataset contains time-series data, for example, a materialized view might be more efficient. Materialized views stored in the database and can queried directly by Looker reducing query latency and improving performance without the need for PDTs.

Conclusion:

Overall, persistent derived tables provide valuable benefits in terms of scalability and query speed when used properly, but not without their downsides. PTDs also can be costly and slow if implemented wrong, and can be limiting to the organization as a whole. With careful consideration of the pros and cons, you can decide if using a PDTs is the best option for your business needs.

--

--

Eric Arsenault
Eric Arsenault

Written by Eric Arsenault

Tech Lead | Analytics Engineering

No responses yet