Building a Governance, Risk, and Compliance Strategy with AWS
At the AWS re:Invent conference, industry leaders gathered to discuss one of the most pressing challenges facing organizations today: data governance in the cloud. The panel session, led by Jen Gray, Senior Manager with AWS Security Growth Strategies, brought together experts from academia, government, and the private sector to share insights on building effective data governance strategies for sensitive information.
The Challenge of Modern Data Governance
If you’ve ever raised your hand when asked, “How many of you are responsible for managing and providing oversight on your organization’s data?” — but kept it firmly down when asked if it was easy — you’re not alone. The panel revealed that while data governance responsibilities are widespread, few find it straightforward to implement effectively.
Organizations today face a barrage of new data privacy regulations and protection laws, but often lack clear guidance on implementation strategies. The session at re:Invent aimed to bridge this gap by showcasing a real-world case study: NYU’s Advanced Data Research Facility (ADRF).
The Advanced Data Research Facility: A New Paradigm for Secure Data Sharing
Dr. Julia Lane, a professor at NYU’s Wagner Graduate School of Public Service and a distinguished economist, presented the ADRF as a groundbreaking solution to a critical problem in research and governance: how to share sensitive data securely across agencies.
The ADRF was created in response to a mandate from the Commission on Evidence-Based Policymaking, chaired by Paul Ryan and Patty Murray in 2016. Its mission: to build an infrastructure that enables secure, ethical data sharing between government agencies at all levels.
Dr. Lane illustrated the need with two compelling examples:
- Economic Research Challenge: Twenty years ago, the Census Bureau needed to capture the dynamic interaction between workers and firms using administrative records — specifically, unemployment insurance wage records that contain sensitive personal information like names, social security numbers, and earnings.
- Social Services Tragedy: In Baltimore, when a child dies, agencies responsible for that child’s welfare (education, social services, criminal justice) only share their data after the tragedy. The challenge was creating a system where agencies could collaborate preventatively while protecting privacy.
The scale of the ADRF is impressive — in a relatively short time, it has incorporated 50 confidential data sets from cities, counties, states, and federal agencies, including sensitive information from police departments, corrections facilities, and human services agencies.
The Five Safes: A Framework for Data Governance
At the heart of the ADRF’s governance strategy is the “Five Safes” framework, which addresses multiple dimensions of data security:
- Safe People: Ensuring only properly trained, authorized individuals can access sensitive data
- Safe Projects: Limiting data use to approved, appropriate purposes
- Safe Settings: Creating secure technical environments for data access
- Safe Exports: Preventing re-identification of individuals when results leave the system
- Safe Data: Combining the above elements to ensure overall data security
As Dr. Lane emphasized, “It’s not just enough to bring the data into one place. You actually want multiple people working with the data because people need to be able to learn from it to be able to do their jobs better.”
This approach recognizes that effective data governance isn’t just a technical solution — it’s also a human one, requiring training, clear protocols, and ongoing stewardship.
Building a Secure Cloud Infrastructure for Sensitive Data
Implementing the Five Safes framework required sophisticated cloud architecture. Youssef Elmit, CEO of Earthling Security (an AWS partner), described the challenge of creating an environment that could handle diverse security requirements from multiple agencies while accommodating researchers’ needs.
The solution had to be:
- Compliance-ready for FedRAMP and Census Bureau requirements
- Capable of handling different authorization levels across user groups
- Ready for production on a tight timeline
- Flexible enough to accommodate evolving research needs
Elmit outlined key design principles that guided their approach:
- “Design small, avoid complexity where it doesn’t need to be”
- Separate operational/management traffic from application data
- Plan for isolation for each new data set ingested
- Leverage pre-approved AWS FedRAMP services and partners
The AWS Marketplace proved invaluable for finding security tools that fit within budget constraints while meeting compliance requirements. This approach helped address concerns from state agencies nervous about cloud adoption, as everything was documented and transparent.
Overcoming Implementation Challenges
The team faced significant hurdles in bringing the ADRF to reality:
- Tight timeline between design, security implementation, and actual data ingestion
- Integration of security into the development process (“shifting security left”)
- Managing different authorization levels for users based on data sensitivity
- Working with legal teams unfamiliar with cloud environments
- Balancing automation with manual processes required by the Census Bureau
Despite these challenges, the ADRF was operational quickly: the decision to use FedRAMP was made in September 2016, a working version was available by February 2017, full authorization was achieved by October 2017, and the Census Bureau’s Authority to Operate was granted by February 2018.
During this period, the ADRF trained nearly 300 government employees from 90 different agencies, demonstrating both the demand for such a solution and its viability.
Data Classification: The Foundation of Effective Governance
Tim Anderson, Senior Technical Industry Specialist with AWS Security Growth Strategies, highlighted the critical role of data classification in any governance strategy.
“If you do not have data classification in place, it’s really hard to know what governance mechanisms to enact,” Anderson explained. Without classification, organizations typically go one of two ways: overcompensating (allocating too many resources and stifling innovation) or undercompensating (exposing themselves to unacceptable risk).
AWS provides several resources to help organizations implement effective data classification:
- A comprehensive data classification white paper
- Professional services support
- Technical solutions like tagging, AWS Glue, Macie, and Redshift
Additionally, newer services like AWS Security Hub and Control Tower can help automate and visualize governance controls, making them more consistent and effective.
The Impact: Transforming Public Sector Data Use
The ADRF has shown that it’s possible to unlock the value of administrative data while maintaining rigorous privacy protections. As Dr. Lane noted, “The private sector has figured out how to use data… if we can get the public sector to figure out how to use data” in a similarly effective but secure way, the impact could be enormous.
Examples of potential benefits include:
- More responsive emergency services during disasters or terrorist attacks
- Better-targeted social services and resource allocation
- Data-driven policy decisions around issues like the opioid crisis
- Cross-agency collaboration on complex social problems
The ADRF has already been recognized with a GCN Award, and has pushed agencies like the Census Bureau forward in their thinking about cloud-based data sharing.
Looking to the Future: Predictions for Data Governance
The panelists offered fascinating predictions about where data governance is headed:
Dr. Lane anticipates we’ll need to incorporate new data sources like cell phones, social media, and sensor data while developing ethical frameworks for their use.
Youssef Elmit predicts governance will become entirely data-centric, with infrastructure becoming less relevant: “What auditors want to know is do you have adequate controls around that data? Is it properly logged, is it protected, is it encrypted?”
Tim Anderson foresees machine learning playing a greater role in risk assessment and policy automation, particularly in IoT environments where trusted and untrusted data sources will increasingly intermingle.
Conclusion: A Community Approach to Secure Data Sharing
Perhaps the most important aspect of the ADRF is its community-oriented vision. Version 2 of the platform aims to create a rich context for data, similar to Amazon.com or TripAdvisor, where researchers contribute knowledge about datasets as they work with them.
This approach acknowledges that data is dynamic — definitions change, collection methods evolve — and building shared knowledge is essential to using it effectively.
The cloud makes this possible in ways that weren’t previously feasible. Agencies don’t have to send their data to another state agency (which they were often reluctant to do); instead, they can share it in the cloud while maintaining stewardship and visibility into how it’s used.
As organizations in all sectors face increasing pressure to protect sensitive data while extracting its value, the lessons from the ADRF provide a valuable template for balancing innovation with security, compliance with usability, and technical solutions with human factors.
With tools like AWS Control Tower and Security Hub, organizations now have more options than ever to implement robust data governance strategies that can scale with their needs while maintaining the highest levels of security and compliance.