Cisco releases open-source toolkit for verifying AI model lineage

Enterprises pulling models from Hugging Face and other open repositories rarely keep records of how those models are altered after download, leaving organizations with little ability to confirm what they are running in production. The State of AI Security 2026 from Cisco places this level of access inside a growing pattern of AI-driven operations that connect directly to core business systems, and identifies AI supply chain exposure as a recurring risk.

Cisco has published the Model Provenance Kit, an open-source Python toolkit and command-line interface that determines whether two transformer models share a common origin by examining architecture metadata, tokenizer structure, and the learned weights themselves.

Why model lineage has become difficult to verify

Hugging Face hosts more than 2 million models. Documentation on open repositories can be falsified, metadata can be stripped or edited, and a model card claiming a model was trained from scratch may describe a modified copy of another model. Many repositories provide limited cryptographic assurance regarding model origin, training data, or modification history, and unsanctioned use of external models has expanded the software supply chain beyond traditional package managers. Recent product releases illustrate the layering involved: Cursor’s Composer 2 was partly built on Kimi 2.5, which was developed by a Chinese startup, and similar dependencies run through much of the industry.

Modern model families compound the verification problem because they share identical architectures. Models from Meta, Alibaba, DeepSeek, and Mistral use the same building blocks, including grouped-query attention, rotary positional embeddings, and Root Mean Square Normalization. A configuration file describes the architecture, which says nothing about whether the weights were copied from another model or trained independently.

Without provenance information, organizations have limited visibility into poisoned or vulnerable models that may propagate inherited flaws into chatbots, agent applications, and customer-facing tools. Provenance also bears on regulatory exposure. The European Union AI Act requires documentation of training data, characteristics of training methodology, and risk assessments for high-risk systems. The NIST AI Risk Management Framework identifies third-party AI component risks as a governance area. AI components shift constantly across the supply chain while existing security controls assume static assets, creating blind spots that complicate downstream compliance.

Some open weight models carry restrictive licenses, and a model that turns out to be a derivative of one trained in a jurisdiction subject to export controls can introduce additional legal considerations. Incident response also suffers when a model’s lineage is unknown, since responders cannot determine whether an issue originates in the model, a related model, a parent, or fine-tuning steps.

Model Provenance Kit’s command line interface (Source: Cisco)

How the kit works

Model Provenance Kit operates in two stages. Stage 1 performs an architectural screening that compares model configurations and structural metadata before any weights are loaded. Pairs sharing identical architecture specifications are classified as related at this stage, which resolves a large portion of cases.

When metadata is ambiguous, the pipeline progresses to Stage 2, which extracts five complementary signals from the model weights:

Embedding Anchor Similarity (EAS) compares the geometric relationships between token embeddings, a structure unique to a training run that survives fine-tuning.
Embedding Norm Distribution (END) analyzes the distribution of embedding magnitudes, which encode word frequency patterns from training.
Norm Layer Fingerprint (NLF) reads the small normalization layers, which remain stable across fine-tuning.
Layer Energy Profile (LEP) compares normalized energy curve distributions across the depth of the network. Different training runs produce different energy distributions even when the architecture is identical.
Weight-Value Cosine (WVC) directly compares weight values between a subsample of corresponding layers. Independently trained models show essentially zero correlation here.

The signals are combined into a single identity score using empirically calibrated weights. When a signal cannot be computed, for example when models have different layer counts, it is excluded and the remaining signals compensate.

Tokenizer signals, including vocabulary overlap analysis and tokenizer feature vector, are computed for diagnostic purposes and excluded from the provenance score. Many independently trained models share tokenizers. StableLM and Pythia both use the GPT-NeoX tokenizer and would score as similar despite having no weight lineage, which would generate false positives if tokenizer signals influenced the final score.

The kit ships with two modes. Compare mode produces a detailed similarity breakdown for any two models drawn from Hugging Face or local checkpoints. Scan mode matches a single model against a database of known fingerprints to surface lineage candidates, treating provenance as a search problem. Cisco has released an initial fingerprint database covering roughly 150 base models across 45 families and 20 publishers, ranging from 135 million to more than 70 billion parameters.

Benchmark results

Cisco evaluated the kit against a 111-pair benchmark composed of 55 similar pairs and 56 dissimilar pairs. The benchmark included aggressive distillation, quantization across formats, cross-organization fine-tuning, LoRA merging, continued pretraining with vocabulary extension, same-tokenizer traps, and independent reproductions of popular architectures. At a 0.70 threshold on a 0-to-1 scale, the kit recorded an F1 score of 0.963, accuracy of 96.4%, precision of 98.1%, and recall of 94.6%.

The kit identified standard derivatives such as fine-tuning, quantization, and alignment with 100% recall, and matched cross-organization derivatives at 100% recall. Same-tokenizer traps were handled at 100% specificity, and independent reproductions such as open_llama and Llama-2 were correctly identified as unrelated.

Four of 111 pairs were misclassified. Each involved an extreme architectural transformation, such as distilling a 12-layer model with 768 hidden dimensions down to 4 layers with halved hidden dimensions, or rebuilding a vocabulary for domain-specific continued pretraining. Cisco describes these as fundamental limits of pairwise weight comparison.

Deployment

The pipeline runs on CPU and scales with model size. Architectural matches resolve in milliseconds, and extracted features are cached for reuse across comparisons. The kit works on any transformer model with downloadable weights.

The repository is on GitHub, and the fingerprint dataset is at Hugging Face.

25 open-source cybersecurity tools that don’t care about your budget

Cisco releases open-source toolkit for verifying AI model lineage

Cisco patches SD-WAN flaw amid evidence of active exploitation

How Jeetu Patel made Cisco unrecognizable

Attackers exploiting unpatched Cisco SD-WAN flaw

Cloud strategies have become more complicated than ever

Which Protocol Spec Matters For Your Website

Why next-question intent matters for AI search visibility

How to find competitor keywords and close visibility gaps

Our Picks

Cloud strategies have become more complicated than ever

Which Protocol Spec Matters For Your Website

Why next-question intent matters for AI search visibility

Cisco releases open-source toolkit for verifying AI model lineage

Why model lineage has become difficult to verify

How the kit works

Benchmark results

Deployment

Related Posts