Register for "The Cold, Hard Reality of Selling Data: 7 Pitfalls You Need to Avoid" - Wednesday, April 30, 1:00 pm EST

Streaming Data Governance: Three Must-Have Requirements to Support AI/ML Innovation

ABSTRACT: This blog defines the governance requirements that streaming data pipelines must meet to make artificial intelligence/machine learning (AI/ML) initiatives successful.

Sponsored by Striim

Farmers know it’s difficult to control streaming water: break one levee and you flood your neighbor’s field. In a similar fashion, AI adopters struggle to govern streaming data: make one mistake and you risk damaging what you’ve worked so diligently to build. But organizations must overcome such difficulties to succeed with AI/ML initiatives. They must provide trustworthy inputs to their models—or risk generating faulty, even illegal outputs.

This blog, the third in a series, defines the governance requirements that streaming data pipelines must meet to make artificial intelligence / machine learning (AI/ML) initiatives successful. The first blog explained why AI/ML, especially GenAI, needs data streaming and the second examined six criteria to evaluate streaming pipeline tools. In this blog, we explore three primary governance requirements—data quality, data privacy and regulatory compliance—that data streaming users handle more responsibly than other AI/ML adopters. We conclude by recommending ways for data leaders at all organizations to meet these requirements as they implement data streaming for AI/ML.

Why streaming?

First, let’s recap why AI/ML needs data streaming. Many popular use cases, including customer service, document processing and supply chain optimization, depend on instant access to the latest business facts. Streaming data pipelines help meet this requirement by manipulating and delivering real-time events to AI/ML models. They mix and match sequences of events to create a granular, sophisticated view of fast-changing business conditions.

The role of governance

Real-time AI/ML models, from ML recommendation engines to GenAI chatbots, are high-stakes endeavors. Organizations cannot afford to misstate facts, expose personally identifiable information (PII) or run afoul of industry regulations as they make fast decisions. This is especially true for “agentic” applications that take autonomous actions with little or no human oversight. Given the risks, AI adopters must carefully extend their governance programs to address streaming data pipelines and real-time processing.


AI adopters must carefully extend their governance programs to address streaming data pipelines and real-time processing.


Three governance requirements stand out in particular: data quality, data privacy and regulatory compliance. Data streaming adopters are more mature in each of these areas, according to a new BARC survey of 124 streaming adopters and 321 overall organizations.

Implementation of Data Governance Controls for AI/ML (% Adopters)

  • 49% of streaming adopters have implemented or even optimized formal data quality controls, compared with 36% of all organizations
  • 63% have implemented data privacy controls, compared with 55%
  • 61% have implemented regulatory compliance controls, compared with 53%

All these numbers should be 100%. The shortfall reflects a fast-changing market in which data teams struggle to keep pace with multiplying data types, sources and models along with evolving regulations. Let’s define each requirement in the context of streaming and explore how data teams can close the gap.

Data quality

Data teams, especially data engineers, must ensure their pipelines deliver accurate, consistent and timely data to support AI/ML model training and inference. They must observe data delivery, validate its quality and trace its lineage from source to target. While data teams have well-established quality tools and processes for tables, they must learn new techniques for ensuring the quality of unstructured objects such as documents, images and video. They must label and classify source objects, then make sure the content maintains its integrity at the target—for example, as embeddings in a vector database.

Such controls require the judgment of data engineers as well as business domain experts. They also require a streaming pipeline tool that helps data engineers or domain experts append and track metadata such as categories, PII identifiers and accuracy scores. Their tools and processes must maintain such metadata for events in a streaming pipeline—and issue real-time alerts of policy infractions.

Data privacy

As consumers, healthcare patients and employees, we have high expectations for data privacy. Recognizing this, organizations rank it as the highest-priority aspect of Responsible AI in a recent BARC survey of 370 data stakeholders. But organizations struggle to protect confidential data, especially when stakeholders demand real-time interactions. People don’t want to wait while an ecommerce site validates their identity or an emergency room checks their insurance policy. Privacy checks must be bulletproof… but fast.

To maintain governance standards, data teams need role-based access controls that authenticate individuals, from data architects to analysts or application users, and authorize the actions they perform. They should implement controls that identify PII within sources, pipelines or targets, then obfuscate it in real-time—for example, by masking specific columns or records. Data teams also might consider encrypting data at rest and/or in flight, depending on the security level of the source, pipeline and target. For example, a pipeline that streams credit card transactions across clouds and national borders might need encryption during the transfer.

Regulatory compliance

AI adopters must work to comply with a range of laws and regulations, from the General Data Protection Regulation (GDPR) and AI Act in the European Union to the California Consumer Privacy Act (CCPA) and Health Insurer Portability and Accountability Act (HIPAA) in the US. Finance, healthcare and other verticals also contend with various sector-specific regulations. Data teams must ensure their compliance policies and processes address streaming data. The stakes run high with these real-time use cases because a regulatory infraction might affect a large population of users before an organization detects and remediates it. And they often trigger mandatory public announcements that damage organizations’ brands.

The privacy controls described above are just a start on addressing regulatory compliance. Other measures include assignment of accountability, delineation of team responsibilities and above all auditing and documentation. Streaming data pipelines can assist the auditing requirement in particular. They should record all data access and actions by user, time and data object, along with metadata that provides context to auditors and compliance officers. Pipelines also should monitor ongoing compliance with policies and alert data teams of any risks or infractions.

Streaming onward

Successful farmers govern streaming water well: they irrigate their field in a controlled fashion to cultivate crops without crossing any fence lines. And AI adopters can achieve similar benefits with streaming data if they implement proper governance controls. They must safeguard data quality, protect privacy, and maintain regulatory compliance. Organizations that implement the right controls—for example, with metadata tracking, role-based access, real-time obfuscation, and audit-ready logging—will provide trustworthy inputs to AI/ML models and avoid legal or reputational damage. By prioritizing governance in their streaming strategies, data teams can unlock AI’s potential while safeguarding their business and customers.

To learn more about their options for managing such pipelines, data and AI leaders can request a demo of Striim’s platform.

Kevin Petrie

Kevin is the VP of Research at BARC US, where he writes and speaks about the intersection of AI, analytics, and data management. For nearly three decades Kevin has deciphered...

More About Kevin Petrie