How to choose a Business Intelligence tool.
Business Intelligence tools are software applications that take in large amounts of data and make it available for analysis purposes.
If you're reading this post, there's a good chance that your business is growing, your current reporting systems are getting stretched to their limits, and you have an endless list of questions that need answering. If this is your first time setting up a Business Intelligence tool at your company, your current reporting systems probably involve a bunch of spreadsheets getting passed around, and a whole lot of importing and exporting CSVs to keep everything up to date. This is a normal phase of business, but also one that is important to move beyond in order to scale.
There are a number of considerations that are worth considering when choosing a Business Intelligence solution. Broadly, you'll need to think about implementation (who will setup and maintain the BI tool) and usage (who will use the BI tool). Breaking this down further, you'll want to think about the following:
Questions to ask before selecting a Business Intelligence tool:
- How many data sources do you have? What are they?
- How large are your datasets? Where is this data stored?
- Who controls the data? Is it sensitive? Who should have access to it?
- How often does the data need to update?
- Who will be setting up the Business Intelligence tool? Are they technical?
- Who will be using the Business Intelligence tool? Are they technical?
- How will the data be presented? And who will see those reports?
- What's your budget?
You can think of these questions as a specifications document. We recommend creating such a document and circulating it to get buy-in from your team. Doing so will save you time and hassle later, and reduce headaches in this procurement process.
How many data sources do you have? What are they?
Create a list of all the places that your company captures, consumes and stores data. Data can have many origins ("data provenance"), but the most common one are event data, transaction data, external API data, and "core".
- Event data - user behavior, like a website visit or button click. Event data can be broken down into having an action, a timestamp, and state. It is usually schemaless (e.g. JSON) and tends to be quite large due to the volume of user actions, and the richness of data that is stored per user interaction.
Google Analytics is the most popular – and near universal – choice for website analytics, and is usually used in conjunction with other analytics tools like Heap Analytics, Full Story, Segment or custom solutions. - Transaction data - when concrete actions are taken by users or team members, we tend to think of this as transaction data. Transaction data is typically structured data and can include things like orders, invoices, payments, sales made, emails sent, candidates recruited, etc.
Common sources of transaction data are sales CRMs (Salesforce, Hubspot), customer support tools (Zendesk, Intercom), accounting and finance applications (Quickbooks, Stripe), project management tools (Asana, Jira), and ecommerce solutions (Shopify, WooCommerce). - External API data - not every company has external API data, and it can be highly contextual to each business. If you are an ecommerce company, you might be tracking your shipping and logistics using UPS or Shipstation data. If you are a travel company you might be consuming weather data from Weather Underground. Or if you are a trading company you might be consuming the Yahoo Finance API.
Whatever the case, it's a good idea to compile a list of external data sources and to determine whether you'll want access to this data in your Business Intelligence tool to answer business questions and to conduct forecasting. It's also a good idea to understand if this is a "want" or a "need", since external API data is inherently data that you don't control, and can be less reliable than internal data that you do control. (If an API changes or breaks, who will fix it?) - Core data is a blanket term we use for internal systems. Most business analysts won't need access to this data, but if questions like "server uptime" are important to your business, then you'll want to be able to access this too. Most of the data you might want, like "page load speed", is sufficiently accessible for most businesses through Google Analytics, so we won't be diving further into core data in this article.
You should now have a list of data sources, "source name", broken out by type: "event data", "transaction data", "external API data", or "core data".
How large are your datasets? Where is this data stored?
As a rule of thumb, event data is considerably larger than transaction data, and external API data can be unpredictable since it's case-by-case. Large data sizes can create engineering issues (how and where to store the data), and also create long lag times when making database queries or loading a report.
Transaction data is the most common data that is needed by people using Business Intelligence tools, this is because it is the closest link to revenue and finance needs of the business. The good news is that if you work at an SMB (small to medium business) then there's a good chance that your transaction data is at a manageable scale and is megabytes or gigabytes in size. For the smaller SMBs, it's even possible that can just export your data files for all your customers and load them up in Excel or Google Sheets in order to build reports.
Mid-market companies (say, 500+ employees) are still usually not exceeding low gigabyte numbers in raw transaction data size. The good news is that for most SMBs and mid-market companies, data size is rarely a concern when analyzing transaction data in a business intelligence tool.
Event data can be problematic, mostly due to its scale. Even relatively small consumer companies can generate tremendous amounts of user data, easily gigabytes in scale and beyond. The bad news is that the more event data you have, the more engineering work that can be required to implement and maintain a BI solution. The good news is that if you have large amounts of event data, your engineers have probably already implemented a storage solution (e.g. Snowflake or AWS S3), so accessing that data in a BI tool will simply be a matter of connecting that data store with your BI tool. If you're only using Google Analytics, but wanting to collect more granular data, then now might be the time to setup a data warehouse as well as a business intelligence tool.
External API data – as we mentioned above, external API data can be unpredictable in size. As a first step, you'll want to determine what (if any) external API data sources that are being used by your business to inform analysis or make decisions. Many businesses don't use any external API data at all, so don't be surprised if the answer turns out to be "we don't have any". If your business does consume 3rd party API data, then the next question for you is determining if this data is needed and relevant for analysis. If so, then you'll want to make sure that it gets piped into your Business Intelligence tool. (Note: this will probably require an engineer.)
Who controls the data? Is any of it sensitive? Who should have access to it?
Data accessibility is an important question that requires responsible and thorough thought. If this is the first time your company has used a Business Intelligence tool, it may also be the first time you're asking hard questions like "should everyone be able to access this data?". If your first answer is "yes, of course!", then you may want to dig a little deeper: What about interns and contractors, should they be able to access all the data from your business? What about people from different business divisions? What about people at different levels of seniority? There will always be a tradeoff between accessibility and security, so it's important to determine where the balance lies.
This gets even more tricky – and important – if your business deals with any kind of sensitive information. Sensitive information might be personally identifiable information (PII) like social security numbers, contact information or physical address – almost every business has some PII on their customers, so it's important that it is secure and handled appropriately. Regulation-sensitive data, such as any HIPPA or financially sensitive data, will require a Business Intelligence solution that is HIPPA compliant, and may require certain security provisions like SOC2 compliance – these are worth investing in although tend to cost more. A more benign example of sensitive information is your employee salary data.
Data control – When it comes to data control, this tends to be split across database admins and application admins. Databases like Postgres, MySQL, Redshift, etc. will require you to liaise to – at a minimum – with the relevant engineers on your team obtain the database credentials. Don't be surprised if you face some scrutiny before these details are handed over, it's their job to make sure that the database is accessed responsibly, and they'll probably want to be involved in the process (if not leading) the BI application setup process.
Application data is usually much simpler to access: someone has admin privileges and you'll need to bring them in on the procurement process and explain what you'll be doing with their data. Salesforce and SaaS tools are examples of "application data".
Who should be able to access data? Now that you've identified all your data sources, determined what's sensitive information, and figured out who controls access to each data source, you now need to determine who should have access to what data inside the Business Intelligence tool.
There are two primary ways of controlling access to sensitive data inside your BI tool: don't import it, or setup access privileges. Not importing sensitive data is a responsible approach, but depending on the size of your company and the intended use, it may not be a reasonable one. It's up to you to determine what needs to be imported to your BI tool, and what can be left out.
Access privileges can be set in some Business Intelligence tools and not others. Usually, the more control you need over bespoke features, the more the BI tool will cost. Unsurprisingly, a popular approach for many small and medium sized businesses (SMBs) is to allow everyone to access everything. Depending on your business, this may be the right approach for you; it's certainly the easiest. Just make sure that as your business scales that you continue to revisit this decision, as you may wish to impose stricter data controls in the future. (All well-managed large companies impose at least some level of information controls across their company.)
How often does the data need to update?
Timeliness of data can vary wildly between each business and each data source, and there tends to be an association between timeliness of data and the engineering work required to keep that data up to date.
A good place to start is by asking this question in terms of business value: "how often does the data need to update"? For most businesses, once a day tends to be more than sufficient, and allows for a daily report to be generated from the previous day's business activities. For other businesses, syncing as frequently as every half hour is important. (Sourcetable syncs data automatically every 15 minutes from most databases and SaaS tools.)
If you need a real-time data streaming solution (maybe you're a stockbroker?), then you're probably looking for a bespoke industry-specific solution that won't be covered by the article.
Who will be setting up the Business Intelligence tool? Are they technical?
Depending on which Business Intelligence tool you use, you may need to dedicate engineering resources to setup and ongoing maintenance. Most enterprise BI tools (Looker, Tableau, Qlik) require dedicated engineering resources to setup, and a some engineering resources for ongoing maintenance (more as your business scales).
Newer BI tools aimed at a self-serve market tend to be much easier to setup and maintain, although they do require some engineering. Mode and Metabase are good examples of these.
Sourcetable is a spreadsheet-based BI solution that doesn't require engineers to setup or maintain. Sourcetable takes only minutes to get started and has a 14-day free trial, so it's worth considering for your analysis and reporting needs.
Lastly, people often forget that Microsoft Excel is the most popular Business Intelligence solution on the planet. It's messy and is hard to keep organized at scale, but it's cheap and – like Sourcetable – doesn't require engineers. For those committed to the Microsoft stack, graduating to Power Pivot is a common path, although it's a little outdated at this point.
Who will be using the Business Intelligence tool? Are they technical?
If cost and engineering resources are your first two concerns, then the usability of your Business Intelligence tool should be your third.
Usability tends to be the biggest ongoing challenge faced by analysts, operators, and anyone seeking to extract insight from data on a regular basis. Generally speaking, the older the Business Intelligence tool, the worse the user experience and the more training required. Newer BI tools like Amplitude, Metabase and Sourcetable have all placed an obvious emphasis on design and usability, and ease-of-use is increasingly becoming the defining criteria for procurement officers when seeking to make their companies more data literate. This is no surprise, because what good is a BI tool if nobody uses it?
"Do you know SQL?" is the fastest way to determine if your business team is sufficiently technical to use the more advanced BI tools. If they don't, then you will either have to train them on SQL, or choose a BI tools that doesn't require them to know SQL.
Unfortunately, an all-too-common pattern is that the business team can't answer their questions due to technical limitations, and so they ask the engineering team to write them some SQL as a "one-off" to answer their questions. This doesn't pass the smell test and leads to constant interruptions for your engineering team, in addition to delays and frustration for everyone involved.
If your business team doesn't already know SQL, or isn't great at it, then we recommend meeting them where they are (instead of where you want them to be), and to use a Business Intelligence solution that allows everyone to dive into the data in a format they already understand: spreadsheets.
If you have also reached this conclusion, then you'll be glad to know that Sourcetable offers a full BI solution with a spreadsheet interface. Users can create models and ask questions from their data using the spreadsheet functions they already understand, and if they need to dive deeper into the data tables, Sourcetable provides a full no-code GUI to assist your team without ever having to write a line of SQL. Sourcetable has a full visualization library to assist when it's time to turn those insights into reports.
How will the data be presented? And who will see those reports?
Visualizations are key for collaboration and communicating information, but which BI tools are best?
If you are an SMB or a mid-market business, you'll probably be emphasizing usability over extensive feature sets required by some analysts. Google Data Studio is your best choice in terms of free tools, although it can be frustrating to use and tends to require a comfort with SQL and BigQuery to get the most out of it. The other tool (aside from Sourcetable) that we would recommend at the affordable end of the spectrum is Chartio. Unfortunately, Chartio has been shut down since being acquired, so it is likely other tools will emerge to fill that gap in the market.
For larger users and those with more sophisticated users, the two BI tools we like for visualizations are Domo – which does a great job focusing on mobile-first dashboards for executives on the go – and Tableau, the industry leader. Both of these will cost more in dollars and engineering input, but you won't be left wanting for dashboards.
One thing to consider is that most BI tools require paid accounts on a per-user basis, so the ability to share reports widely may cost you. One hack to circumvent this and keep things within budget is to determine if everyone who is seeing reports needs to be able to access the underlying data. If not, you can look for tools that allow you to email out PDF reports on a daily, weekly or monthly basis, since those can usually be sent to users who aren't paying members of your BI tool.
What's your budget?
If you have no budget and a non-technical team, the manual approach through Google Sheets or Excel are best options. If your team all knows SQL, then Mode Analytics also offers a free tier, and if you're looking for a self-hosted option and can spare the engineers for setup and maintenance, Metabase is also a good choice. Amplitude deserves a mention for ease of use and graphics support.
If you have some budget (say, $100-$300/m), and want a fully-hosted solution without requiring engineers in the loop for day-to-day operations, Sourcetable and Metabase are both great choices. Metabase will require you to bring your own data warehouse, whereas Sourcetable will provide one. Between the two, the main differentiator is that Sourcetable allows users to conduct more thorough analysis through a spreadsheet interface and the full range of spreadsheet formulas that all business users already know.
At the top end of the budget spectrum, there's no end to the plethora of choices. Looker, Tableau, Qlik and Alteryx are all worthy of consideration, and provide more scrutiny than this article will be able to provide. It is worth nothing that all enterprise-level BI solutions will require solutions engineers to setup, and aren't considered self-serve.
In summary:
There's a lot to think about when choosing a Business Intelligence solution. Hopefully you now have a spreadsheet with a list of questions, with answers next to them for each of the categories listed above to help you make a decision.
Best of luck on your adventures in data, and if you're keen for a free trial of Sourcetable, sign up here.