Scaling Reports and Analytics for a SaaS Platform
Reports are a critical part of the Botsplash platform. Since the initial platform release in 2018, we have always provided the reports to our customers to monitor the traffic, identify engagement anomalies and to make informed decisions.
On the surface, the reports and analytics always functioned the same, however there were a number of design and implementation changes made every couple of months to handle the accuracy, data complexity and scale. The audience for our reports are platform users, client admin(management leadership) and Botsplash client success.
These are some of the reports we support and use it for improving the platform:
- Standard Dashboard Reports
- Ad-hoc Custom Reports
- Scheduled Reports
- System Performance Reports
- System Anomaly and Detection Reports
- Audit Logs : Visitor logs and Agent Logs
One approach to reporting is to host a large commercial/cloud solution. While this is an option, there is a significant cost and engineering effort associated with such an implementation. Being a lean team, managing a robust, enterprise platform, it was important to choose a solution that fit our platform needs, level of effort, easy to manage, flexible and highly reliable.
Below are some of the strategies/solutions that we followed over time:
Best practices and nuggets of knowledge gathered:
- SQL is universal, easy to learn and the most friendliest option for data access and manipulation. Use and extend SQL, where possible. SQL reports enables you to provide the reports that your business needs.
- Usage of Python programs, packages, reports and/or visualization significantly increases the engineering effort and deployment complexity, unless your solution/team is rooted in Python.
- Running reports against transaction databases can be detrimental to your production systems. Identify all larger scope reports and move them to data warehouses or external systems.
- As the data grows, the execution plan of the reports may change causing sudden performance issues. Proactively continue to monitor the reports and implement necessary optimizations or move them out of the transactional system.
- Materialized snapshots help with reports performance but eventually will cause problem generating the snapshots themselves at scale (unless they support live updates similar to ClickHouse materialized snapshots)
- Upgrading the server will help but not significantly for larger scoped data reports. Unless the upgrade is too much for larger systems.
- Running reports on Mirroring or backup servers of Transactional database systems will not help with performance problems. The longer queries continue to block the mirroring process causing bottlenecks or reports will fail to execute.
- There are many open source solutions and easy deployment options available, to justify commercial offerings.
- At scale, do not take a new data warehouse solution for granted, run a POC and evaluate it for yourself before committing for production use.
- Our ElasticSearch POC was too early and didnt justify the effort for adoption and eventually dropped out. Make sure you have a strong/ compelling reason to switch to a new platform.
- Extensions such as TimescaleDB are promising and most easy to implement due to our team’s background in PostgreSQL, but after closely evaluating the ClickHouse and execution of POC, it was clear that ClickHouse handles our requirements much more easily.
- ClickHouse database or other alternatives comes with its own set of constraints, understand and evaluate properly before considering them for your production environments.
- Streaming the data to an additional server is a hard problem to solve. Streaming data correctly requires careful planning and may require multiple iterations to get the requirements right. Though we operate using queues, it was much easier to rely on batch queuing for accuracy than real time streaming.
Building scalable reports is challenging. The best way to work on such a mighty challenge is by implementing for the current scope and requirements, and improving the solution iteratively as scope changes or data growth patterns are realized. The engineering team plays a pivotal role and should be on the lookout for solutions that meet the platform needs and proactively execute the solutions as part of tech debt or system improvements.
We are always working to improve reporting capabilities, automation, engagement solutions to make it easy for our clients and their customers. If this is interest to you, leave us a note or if you would like to work on similar interesting projects, send your resume to firstname.lastname@example.org and do explore other technical write-ups from the team.