AI Governance in Financial Services: What the Regulations Actually Require
Briefcase AI — February 2026
Boards and CROs are asking the same question right now: what does our AI governance posture actually need to look like?
Most of the answers circulating are vendor-written, framework-heavy, or based on proposed guidance that hasn't been finalized. This post goes to primary sources. What do the major regulatory bodies governing U.S. financial services currently expect from institutions deploying AI — not in theory, but in examination?
The short answer: more than most institutions have built.
The Regulatory Landscape
AI governance in financial services is not governed by a single framework or a single regulator. It is governed by the intersection of existing regulatory obligations — model risk, consumer protection, anti-discrimination, anti-money laundering — applied to AI systems that were not contemplated when those obligations were written.
That intersection is where most institutions have a gap.
The regulators with the most direct examination authority over AI deployments in banking and fintech are the OCC, CFPB, FinCEN, OFAC, FINRA, and the SEC. Each has a distinct mandate. Each is asking different questions. None of them accept "the model decided" as an answer.
OCC: Model Risk Management
The OCC's primary framework for AI governance is SR 11-7, the model risk management guidance issued jointly with the Federal Reserve in 2011. It predates large language models and multi-agent AI systems by a decade, but examiners apply it directly to AI deployments today.
SR 11-7 establishes three requirements that map directly to AI systems:
Model validation. Banks must validate models before deployment and on an ongoing basis. For AI systems, this means documenting what the model does, what data it was trained on, and how its outputs are tested for accuracy and consistency. A model that cannot be validated — because its internals are opaque or its outputs are not logged — fails this requirement.
Ongoing monitoring. Banks must monitor model performance continuously and identify when model behavior has changed. For AI systems operating at high velocity — fraud scoring, credit decisioning — this requires automated drift detection, not periodic manual review.
Documentation. Every model decision must be documentable. For adverse credit actions, this means producing a specific, defensible justification at the individual applicant level. Aggregate model performance statistics do not satisfy this requirement.
The OCC has been explicit in recent examination cycles that AI systems are subject to SR 11-7 in full. Institutions that have applied lighter-touch governance to AI than to traditional statistical models are finding that position is no longer tenable.
CFPB: Adverse Action and Explainability
The CFPB's authority over AI governance flows primarily through ECOA and the Fair Credit Reporting Act. Both predate AI, but the CFPB has been increasingly specific about how they apply.
The key requirement: adverse action notices must state the specific reasons for a credit denial. The CFPB has made clear that "the model declined" is not a specific reason. For AI-driven credit decisions, this means the institution must be able to identify, at the individual applicant level, which features drove the outcome — and express those features in terms a consumer can understand and potentially act on.
This is harder than it sounds. Many AI models produce outputs that are not natively explainable at the feature level. Institutions have addressed this with post-hoc explanation techniques (SHAP values, LIME), but the CFPB has signaled that post-hoc explanations must be accurate representations of what the model actually did — not approximations that may not hold for a specific applicant's decision.
The CFPB has also signaled concern about disparate impact in AI credit models. A model that is facially neutral but produces systematically different outcomes across demographic groups can violate ECOA regardless of intent. Identifying disparate impact requires cohort-level analysis — the ability to run every denial across a time period against a consistent model configuration and test for differential outcomes. This is only possible if the institution has maintained a complete, auditable record of every decision and the model version that produced it.
FinCEN: BSA/AML and SAR Defensibility
FinCEN governs Bank Secrecy Act compliance, which includes anti-money laundering programs and suspicious activity reporting. AI is now embedded in both.
The examination question that most AML programs are not prepared to answer: for a specific SAR filing or non-filing decision, what rule configuration and model version produced the alert that triggered the analyst review?
Most AML systems can produce the alert. They cannot produce the exact threshold and rule configuration that was active at the time of the alert — particularly if those thresholds changed at any point in the intervening period. When an examiner is reviewing an AML program, they are looking at historical decisions against current configuration. If the configuration has changed, the historical decisions become difficult to defend.
FinCEN's examination standards require that AML programs be documented, consistent, and auditable. For AI-driven transaction monitoring, this means maintaining version-controlled records of rule configurations at every point in time — not just the current state of the system.
OFAC: Sanctions Screening and List Version Control
OFAC has the most explicit documentation requirements of any regulator on this list. Civil penalties for sanctions violations are assessed per transaction, with no statute of limitations. Each blocked or improperly cleared transaction is a separate violation.
The examination question OFAC examiners ask is precise: what watchlist were you running at the time of this screening event?
Not: what watchlist do you run generally. Not: what is your current screening process. What was the exact version of the OFAC SDN list and any consolidated watchlists active at the moment this specific payment was screened?
Most sanctions screening systems log the screening result. They do not log the watchlist version SHA that produced it. That gap is the difference between a defensible examination and a civil penalty.
FINRA: Supervisory Procedures for AI Recommendations
FINRA Rule 3110 requires broker-dealers to establish and maintain supervisory procedures for all activities, including those involving AI. For robo-advisory and algorithmic investment recommendation systems, this means supervisory review of AI-generated recommendations — not just of the model in aggregate, but of individual recommendations when they are subject to complaint or examination.
The practical requirement: when a client claims that an AI-generated investment recommendation caused losses, the firm must be able to reconstruct the exact recommendation — what model version was running, what the client's profile input was at the time, what the firm's investment policy statement said, and what the model's output was. This reconstruction must be available on demand, not assembled from fragmented logs over several days.
FINRA has also been explicit that the supervisory obligation is not satisfied by periodic model reviews. The supervision is of the recommendation, not the model.
SEC: Reg BI and the Best Interest Standard
For broker-dealers, SEC Regulation Best Interest requires that investment recommendations be in the client's best interest at the time the recommendation is made. For AI-driven recommendation systems, this creates a specific documentation obligation: the institution must be able to demonstrate that at the time of a specific recommendation, the model was applying the correct client profile, the correct investment policy, and the correct constraint set.
The SEC's enforcement pattern on Reg BI has followed a consistent trajectory: informal guidance, then examination findings, then enforcement actions against firms that cannot demonstrate compliance at the individual recommendation level.
The SEC has also issued informal guidance — including a February 2026 FAQ update on stablecoin capital treatment — indicating that AI-driven decisions in digital asset operations will be subject to the same supervisory requirements. The informal nature of that guidance does not reduce the examination exposure; it typically precedes it.
The Cross-Cutting Requirement
Across every regulator on this list, the same underlying requirement surfaces in different forms:
The institution must be able to reconstruct any AI decision — the inputs, the model version, the rule configuration, and the output — at any point in time, for any individual transaction or applicant, on demand.
This is not a new requirement invented for AI. It is the application of existing model risk, consumer protection, and anti-discrimination obligations to systems that were not designed with auditability in mind.
Most AI systems in production today were not designed with this requirement. They log outputs. They do not capture the full decision context — model version, feature inputs, constraint set — in a form that is retrievable at the transaction level.
The gap between what regulators require and what most institutions have built is where examination findings happen.
What Defensible Infrastructure Looks Like
Meeting these requirements is not a documentation exercise. It requires infrastructure that captures the right information at the time of the decision — not reconstructed afterward.
Specifically:
- Every AI decision is linked to the exact model version active at execution
- The input data used at the time of the decision is preserved, not inferred from current data
- Rule configurations and threshold settings are versioned and linked to every decision they produced
- The full decision trace is retrievable at the individual transaction level in seconds, not hours
This infrastructure does not replace the AI system. It sits alongside it — capturing what the AI did, at the moment it did it, in a form that survives examination.
The Timing Question
The consistent pattern across OCC, CFPB, FinCEN, OFAC, FINRA, and SEC is the same: informal guidance first, examination findings second, enforcement third. Institutions that build governance infrastructure before the examination cycle are in a defensible position. Institutions that build it in response to a finding are managing a different, more expensive problem.
The AI governance examination wave in financial services is not coming. It is in progress. The question is not whether your institution's AI decisions will be scrutinized. It is whether you will be able to answer the questions when they are asked.
Briefcase AI builds decision governance infrastructure for regulated AI deployments — capturing immutable decision traces at every agent handoff, returnable in under two seconds.
Want fewer escalations? See a live trace.
See Briefcase on your stack
Reduce escalations: Catch issues before they hit production with comprehensive observability
Auditability & replay: Complete trace capture for debugging and compliance