Data Lake (RAE)

Ohio State’s Reporting and Analytics Environment (RAE) is a centralized data lake style system that provides a place to capture, organize, enhance, and query core information about the university’s business processes.

The RAE holds data needed for both extended and detailed analytics. The data lake also provides capabilities for university areas to store and share data as well as allowing data analysts to bring additional datasets into the system for deep, cross-functional analytics. 

What data is in the RAE?

Use the public RAE Object Domain Directory dashboard to learn more about what data is in the RAE. With the dashboard you can see the different domains (HR, Finance, etc.), specific tables, and permissions needed to access the data in the RAE.

Components

Data Storage

Data lakes are often used to consolidate all of an organization’s data in a single, central location, where it can be saved “as is,” without the need to impose a schema (i.e., a formal structure for how the data is organized). This helps us to avoid lock-in to a proprietary system like a data warehouse, which has become increasingly important in modern data architectures. Data lakes are also highly durable and low cost, because of their ability to scale and leverage object storage.

Data Ingest

There are a number of tools and capabilities that come into play for loading data into a data lake. We’ve designed the system to accept data from other databases, web APIs, and file inputs. Currently, we support both batch (in frequent) and micro-batch (near time) data feeds from sources across Ohio State, including Workday, SIMS, Salesforce, and Peoplesoft SIS. In the near future, we will also offer support for streaming datasets (real time), as well as unstructured data inputs.

Optimized Data

While many analysts are comfortable working with “as is” data, it is best practice to apply learnings to the data to make other analysts work much easier. Thus, we work with data analysts and engineers to apply well designed and vetted transformations that will increasingly make the data more usable and accessible for analytics needs. As data goes through each progressive transformation, we categorize these “new datasets” using the Medallion Data Architecture nomenclature: Bronze, Silver, & Gold.

 

Data Lake at a Glance

Content

The Enterprise Data Lake team is constantly working with data partners around campus to acquire, store, and make datasets available to analysts. The RAE’s primary focus is to capture data about Ohio State’s key business processes and systems. We do not store information or data related to OSU’s research mission, nor data from any OSU Medical Center systems (eg. EPIC). To learn more about the data we have currently loaded into the RAE, you can view the RAE Object Domain Directory Tableau dashboard.

Target Audience

The RAE forms the hub for data storage of datasets that analysts use daily to derive insights about Ohio State’s business operations and systems. The core user of the data lake will be someone that is skilled in: 

  • Relational databases 
  • SQL query writing 
  • Working with “as is” or “raw” data, including building your own transformation and join logic 
  • Building insights from the ground up using SQL queries and/or tools like Tableau, python, R, and SAS 

As we previously mentioned, as data gets progressively cleansed and more aligned to a “final business view”, analysts that may not be as skilled in the above capabilities, may only referenced the RAE to utilize pre-built Silver or Gold layer data objects, sourcing them into tools like Tableau for dashboard building and reporting.

Technology Set

Today, the RAE data lake is powered by components within OSU Amazon Web Services that are combined into a customized system implementation. The core AWS components utilized by the RAE are: 

  • Redshift 
  • Glue 
  • Lambda 
  • S3 

You can read more about AWS and its analytics tools and capabilities.

 

FAQs

What is the RAE?

The Reporting and Analytics Environment (RAE) is both a place to source and store data. It will provide an environment for data analysts interested in creating reports and performing analytics with cross-functional datasets. The RAE will house historical data, as well as data from other systems that will not be converted to Workday.

How is the RAE related to the implementation of Workday?

Workday’s robust reporting environment helps leverage Ohio State data quickly and easily for operational decisions at all leadership levels. 

Much of the data that now resides in local data marts distributed across campus will be brought together in Workday and the Reporting and Analytics Environment. Teams in different units who rely on these data for their units’ processes will access them from Workday and the Reporting and Analytics Environment. Historical data from local data marts can be included in RAE.

If you aren't sure what step to take next for your reporting needs, check out the Reporting Overview information contained in the Administrative Resources Center (ARC), the repository of how-to information for many OTDI supported tools.

 

What tools can I use to access data in the RAE?

Data in the RAE can be accessed using Tableau Web (using Enterprise Tableau Data Sources), Tableau Desktop or any SQL-capable desktop tool (such as DBeaver).

How can I connect directly to the Redshift tables available to me?

If you have been granted access to any Redshift tables in the RAE, you can access them with a SQL navigation program. DBeaver is the selected tool of choice for accessing the RAE, however, you can still use your current software. You will need to update your settings in your SQL navigation program to be able to connect. Instructions on how to install DBeaver and what settings to use are located in the OTDI Knowledge Base.

I need a contact person for additional help with Workday reporting and historical data.

Contact information for reporting leads and business area representatives is available in the Adminsitrative Resource Center to help with your data and Workday reporting needs. Log in to the ARC with your Ohio State credentials.

If you still have questions about getting historical data or reporting help after consulting the information in the previous link, please contact the Service Desk by calling 614-688-4357 (HELP) or emailing servicedesk@osu.edu and ask that your question be routed to the “Administrative Services Data Warehouse” group.

Who should I contact for RAE technical support?

If you are having technical issues with connecting to or querying the RAE, please contact the IT Service Desk.

 

Need Help?

If you are having technical issues with connecting to or querying the RAE, please contact the IT Service Desk.