Data Lake (RAE)

Ohio State’s Reporting and Analytics Environment (RAE) is a centralized data lake style system that provides a place to capture, organize, enhance, and query core information about the university’s business processes.

The RAE holds data needed for both extended and detailed analytics. The data lake also provides capabilities for university areas to store and share data as well as allowing data analysts to bring additional datasets into the system for deep, cross-functional analytics. 

What data is in the RAE?

Use the public RAE Object Domain Directory dashboard to learn more about what data is in the RAE. With the dashboard you can see the different domains (HR, Finance, etc.), specific tables, and permissions needed to access the data in the RAE.

Components

Data Storage

Data lakes are often used to consolidate all of an organization’s data in a single, central location, where it can be saved “as is,” without the need to impose a schema (i.e., a formal structure for how the data is organized). This helps us to avoid lock-in to a proprietary system like a data warehouse, which has become increasingly important in modern data architectures. Data lakes are also highly durable and low cost, because of their ability to scale and leverage object storage.

Data Ingest

There are a number of tools and capabilities that come into play for loading data into a data lake. We’ve designed the system to accept data from other databases, web APIs, and file inputs. Currently, we support both batch (in frequent) and micro-batch (near time) data feeds from sources across Ohio State, including Workday, SIMS, Salesforce, and Peoplesoft SIS. In the near future, we will also offer support for streaming datasets (real time), as well as unstructured data inputs.

Optimized Data

While many analysts are comfortable working with “as is” data, it is best practice to apply learnings to the data to make other analysts work much easier. Thus, we work with data analysts and engineers to apply well designed and vetted transformations that will increasingly make the data more usable and accessible for analytics needs. As data goes through each progressive transformation, we categorize these “new datasets” using the Medallion Data Architecture nomenclature: Bronze, Silver, & Gold.

Student Data Coming to the RAE (Spring 2025)

In spring of 2025, student data will be transferred from the Operational Data Store (ODS)/DWHCRPT to the Reporting and Analytics Environment (RAE)/AWS Redshift. Once this transition is complete, the ODS will be retired, and all student reporting data will be stored in the RAE.  

Why is this change important?

This project is an important effort to bring together raw student data with high quality curated cumulative datasets built from raw data, into a centrally, governed source in the RAE. While raw student data is currently available to university analysts in our Oracle data warehouse, many curated datasets have been created by analytics units which are not available university wide. Migrating raw student data along with these critical datasets into the RAE will unlock the analytics potential of our institutional data by streamlining and democratizing existing data resources.  

This project also provides the opportunity to move to a modernized technical architecture that is more sustainable and can better support the increasing need for data and technology. Updates will be made to this page and communications will be shared with affected audiences as we approach the spring go-live date. Answers to Frequently Asked Questions (FAQs) are available below (FAQs last updated 1/3/25).

I’m a former ODS user, what is the RAE?

RAE stands for the Reporting and Analytics Environment. On the backend, data is primarily stored in an AWS Redshift database. This environment has been around for several years and already contains data from Workday HR and Finance, as well as other areas. 

Why are we moving to use the RAE instead of continuing to use the ODS (DWHCRPT)?

This project is an important effort to bring together raw student data with high quality curated cumulative datasets built from the raw data into a central, governed source in the Reporting and Analytics environment. While raw student data is currently available to university analysts in our Oracle data warehouse, many curated datasets have been created by analytics units which are not available across the wider university. Migrating raw student data along with these critical datasets into the RAE will unlock the analytics potential of our institutional data by democratizing existing data resources. 

This project also provides us with the opportunity to move to move off software and servers that have extremely limited support due to their age and replace with more current architecture that is more sustainable and can better support the increasing need for data and technology.

What is the timeline for moving student data into the RAE?

The current timeline includes a transition phase where the data will be available in both the ODS and RAE. The goal is to allow users more time to make the transition, since there are many different tools used to access the data and those will need to be updated to use the RAE. The ODS is slated to be shut down in late March of 2025, after the transition to the RAE is complete.

What data is moving to the RAE?

The data that is currently in the ODS will be transferred to the RAE. No historical data will be left behind in the ODS. Other data moving to the RAE inludes:

  • Raw student data currently loaded to the ODS
  • Student Financial Aid data
  • The snapshot data currently in the Datamart (DWDMOSU)

Use the public RAE Object Domain Directory dashboard(link is external) to learn more about what data is in the RAE. With the dashboard you can see the different domains (HR, Finance, etc.), specific tables, and permissions needed to access the data in the RAE.

RAE access questions

After January 1, 2025, no new accounts should be requested for the ODS.  Any new user accounts requesting access to student data should be requested in the RAE.

If you already have an account in the RAE to access data other than student data, you will need to request permission to have student data added to your account.

If you do not have access to the RAE, you will need to request a new account.

The updated request form with specific information on how to request a new service account will be available when security updates have been made and access to the RAE can be granted. ​​The same approval process used for access to the ODS will be continued in the RAE. The request for access will be approved by the data stewards prior to access being granted.

There will be an additional communication sent regarding access requests in the coming weeks.

What if I use a Service Account to access the ODS?

If you have a Service Account in the ODS, you will either need to have a new account created in the RAE or have student permissions added to your existing Service Account in the RAE. 

The RAE access request form in ServiceNow will prompt you to enter the student domains you need to access. The domain structure in the RAE is different from the permissions in the ODS. You will need to know what permissions to request to complete the ServiceNow form.   

To assist you, the Student ODS to RAE project team has a list of the domains needed for each account. However, we don’t have a list of the active Service Account owners in the ODS. We can provide you with the correct domains for your Service Account if you send an email to the project team at otdi-studentdataintherae_project@osu.edu.  

Action Required for Service Account Owners:

Please provide us with the name of the Service Accounts you are currently using in the ODS. Once you do so, we will email you the domains you will need to request in the RAE to ensure you have the same access after the ODS is retired. Using the information we provide, you will be able to complete the request form in ServiceNow, whether you are consolidating your RAE access or requesting a new account. 

Please remember that managers and Data Stewards will still need to approve your request once it is entered into ServiceNow. The approval process is not changing. 

Please email us the name of your Service Account by 01/24/2025, so that the project team can tell you what to request. 

What tools can I use to get data from the RAE database?

You will need to be connected using an appropriate VPN (eg, Cisco Secure Client-OSU Managed Devices)

Several software programs can be used to access the data and your choice depends on your use case. We recommend the dBeaver SQL client, but other tools such as Tableau and MS-Access can be used as well.

Is there any new student data in the RAE?

In addition to the new curated cumulative datasets, you will also have access to daily historical change tables for most ODS tables and custom objects.

Is the data static or does it get updated throughout the day?

The data in the RAE will be updated once per day. The specific time frames for the daily updates will be similar to the ODS. As more data is loaded into the RAE, a more complete schedule will be released.

Will the connection information to the RAE be different than the ODS?

Yes. The transition to the RAE will require new connection information.

This is the information needed by most users to connect to the RAE.

  • Name: redshift-prd.rae.osu.edu
  • Host: osu-edw-prd-01.cys8zs8qp7jm.us-east-2.redshift.amazonaws.com
  • Port: 5439

Will there be schema changes?

Yes, there will be schema changes. The linked Schema Changes .pdf provides more details. Additionally, recorded instructions on how to use the tool to make the updates can be found in the Administrative Resource Center.

I have technical questions about this change.

Technical questions about the moving of student data from the ODS to the RAE can be sent to OTDI-StudentDataInTheRAE_Project@osu.edu.

 

 

Data Lake at a Glance

Content

The Enterprise Data Lake team is constantly working with data partners around campus to acquire, store, and make datasets available to analysts. The RAE’s primary focus is to capture data about Ohio State’s key business processes and systems. We do not store information or data related to OSU’s research mission, nor data from any OSU Medical Center systems (eg. EPIC). To learn more about the data we have currently loaded into the RAE, you can view the RAE Object Domain Directory Tableau dashboard.

Target Audience

The RAE forms the hub for data storage of datasets that analysts use daily to derive insights about Ohio State’s business operations and systems. The core user of the data lake will be someone that is skilled in: 

  • Relational databases 
  • SQL query writing 
  • Working with “as is” or “raw” data, including building your own transformation and join logic 
  • Building insights from the ground up using SQL queries and/or tools like Tableau, python, R, and SAS 

As we previously mentioned, as data gets progressively cleansed and more aligned to a “final business view”, analysts that may not be as skilled in the above capabilities, may only referenced the RAE to utilize pre-built Silver or Gold layer data objects, sourcing them into tools like Tableau for dashboard building and reporting.

Technology Set

Today, the RAE data lake is powered by components within OSU Amazon Web Services that are combined into a customized system implementation. The core AWS components utilized by the RAE are: 

  • Redshift 
  • Glue 
  • Lambda 
  • S3 

You can read more about AWS and its analytics tools and capabilities.

 

FAQs

What is the RAE?

The Reporting and Analytics Environment (RAE) is both a place to source and store data. It will provide an environment for data analysts interested in creating reports and performing analytics with cross-functional datasets. The RAE will house historical data, as well as data from other systems that will not be converted to Workday.

How is the RAE related to the implementation of Workday?

Workday’s robust reporting environment helps leverage Ohio State data quickly and easily for operational decisions at all leadership levels. 

Much of the data that now resides in local data marts distributed across campus will be brought together in Workday and the Reporting and Analytics Environment. Teams in different units who rely on these data for their units’ processes will access them from Workday and the Reporting and Analytics Environment. Historical data from local data marts can be included in RAE.

If you aren't sure what step to take next for your reporting needs, check out the Reporting Overview information contained in the Administrative Resources Center (ARC), the repository of how-to information for many OTDI supported tools.

 

What tools can I use to access data in the RAE?

Data in the RAE can be accessed using Tableau Web (using Enterprise Tableau Data Sources), Tableau Desktop or any SQL-capable desktop tool (such as DBeaver).

How can I connect directly to the Redshift tables available to me?

If you have been granted access to any Redshift tables in the RAE, you can access them with a SQL navigation program. DBeaver is the selected tool of choice for accessing the RAE, however, you can still use your current software. You will need to update your settings in your SQL navigation program to be able to connect. Instructions on how to install DBeaver and what settings to use are located in the OTDI Knowledge Base.

I need a contact person for additional help with Workday reporting and historical data.

Contact information for reporting leads and business area representatives is available in the Adminsitrative Resource Center to help with your data and Workday reporting needs. Log in to the ARC with your Ohio State credentials.

If you still have questions about getting historical data or reporting help after consulting the information in the previous link, please contact the Service Desk by calling 614-688-4357 (HELP) or emailing servicedesk@osu.edu and ask that your question be routed to the “Administrative Services Data Warehouse” group.

Who should I contact for RAE technical support?

If you are having technical issues with connecting to or querying the RAE, please contact the IT Service Desk.

 

Need Help?

If you are having technical issues with connecting to or querying the RAE, please contact the IT Service Desk.