Skip to main content

Data Source Best Practices

To further understanding in the creation, access, and use of data sources.  

Bad Data Sources vs. Great Data Sources

Bad data sources lead to:

  • Slow dashboard performance. Performance is one of the biggest drivers of dashboard adoption – if a data source is slow, users stop trusting it – even if the data is correct.
  • Confusing field names
  • Duplicate and redundant data
  • Lack of hierarchy and organization
  • Data governance issues

Great data sources provide:

  • Faster performance with optimized structure
  • Clear, intuitive field names
  • Better organization with hierarchies and folders
  • Scalability for future growth. Scalable data sources are future proof (ex., no hard coded dates and permissions or row-level security is based on groups – not individuals). Scalable data sources are easy to maintain. They are designed to grow with your users and your analytics needs.
  • Trust in the data for end users. 

Best Practices

Data source creation

  • Include only necessary fields. If using a pre-built data source, hide unused fields before publishing the workbook.
    • Fewer fields reduce data source size and makes visuals more responsive. Complete this step before creating an extract so Tableau will retrieve only the requested columns from the data source.
  • When possible, avoid the use of S4 data.
  • Create with performance in mind.
    • Tableau works best when the data source is narrow (few columns) and not wide (hundreds of columns).
    • Clean, write calculations, and aggregate as close to the data layer as possible or use a tool such as Tableau Prep Builder. Additional information on when to use Tableau Prep Builder vs. Tableau Desktop. FAQs on Tableau Prep are available; a Tableau Prep Builder training is in BuckeyeLearn.
    • When working with calculations,
      • Move row level calculations out of Tableau – particularly row level string calculations.
      • Consolidate and duplicate logic across calculations.
      • Aggregates
        • MIN/MAX are faster than AVG/ATTR
        • COUNTD is slower than COUNT
        • Use LODs only when needed.
      • Dates
        • Avoid hard coding date references. For example, do not hard code for a fiscal year.
  • Develop a data source naming convention. For example, in the OTDI project, data source should start with ‘OTDI –’.
    • This helps Creators and Project Leaders stay organized. Names should be descriptive enough to identify the data’s content.
    • Do not be afraid to use descriptions for data sources. Descriptions should outline the purpose and/or audience for the data source.
  • Document your data source

Data source access

  • Users must be up to date with all required trainings to access a project. At a minimum, users must be up to date with IDP training. Additional trainings such as FERPA may be required to view data in the project. Dashboards in the [Creator Toolkit] provide information on who has completed training.
  • If the data is sensitive, do not give users the ability to download raw data.
    • Data visualizations are mostly aggregate displays, so personal identifiers are rarely, if ever needed.
  • Regularly review access to the data source. Suggestions on how to review group membership are available.
  • To further understand permissions, please review our documentation on Tableau Server permissions and best practices

Data source use

  • Using stale content, remove or archive unused data sources. If you’re worried about a user later needing the report, creating an archive folder within the project is a way to store reports before they are removed from Server. Additional information about project governance best practices is available.
  • Schedule extract refreshes during non-business hours. Resources used for refreshing extracts take from those needed for viewing visuals.
  • Leverage data source filters when only a subset of data is needed. For example, if you only need data for one cost center, apply a data source filter at the start for only that cost center. This will improve performance. 

Understanding Embedded vs. Published Data Sources

When creating a data source for a visualization, Tableau offers two options: embedded and published to Server. 

Embedded data sources

  • Embedded data sources directly tie a data source to the workbook. In other words, the data source is in the workbook.
  • As the data source is in the workbook, it can result in multiple copies of the same data being stored and refreshed on Server.
  • Embedded data sources allows for easier customization of a data source without impacting other dashboards. 

Published data sources

  • Data sources published to Tableau Server allows multiple reports to share a common data source. For example, data sources in the Enterprise project folder are published directly to Server – allowing analysts across the university to connect to a single source of truth and build reports for their units.
    • To view Enterprise data sources, please see the Enterprise Directory dashboard. It will default to dashboards; select the toggle to view available data sources. Information on permissions needed and how to request them is available in the info icon. Enterprise data sources have been vetted and approved by subject matter experts in the business unit(s).
  • Published data sources reduce the number of copies of a data source stored on Tableau Server. They also reduce the size of workbook files – making them faster to save, publish, and download.
  • Published data sources secures data separate from a workbook. 

If there are any questions, please email data-visualization@osu.edu.

Additional Resources

Important Information

  • The biggest impact that can be made to a dashboard’s load time from a single change is a change to the data source.
  • A change to the data source can also result in a lot of re-work in a dashboard.
  • It is critical to spend time early on understanding your data structure and considering how your data source should be set up to avoid headaches later.