Executive Summary
Machine learning on blockchain faces a critical privacy problem: model inputs, outputs, and weights are public by default, exposing sensitive data and proprietary models. Zero-knowledge machine learning (ZK-ML) solves this by generating cryptographic proofs that computation was performed correctly without revealing the underlying data.
Key Findings (Q1 2026):- Proof generation: <5 seconds for 10M parameter neural networks (EZKL v1.2)
- Verification cost: $2-8 on Ethereum mainnet (vs $200-400 for on-chain ML computation)
- Accuracy preservation: 99.8% (negligible degradation from quantization)
- Production deployments: 12 institutional use cases (credit scoring, fraud detection, compliance)
- Private credit scoring: Prove creditworthiness without revealing transaction history
- Compliant fraud detection: Run ML models on sensitive data without exposing PII
- Proprietary model protection: Sell AI inference-as-a-service without revealing weights
- Regulatory compliance: GDPR/CCPA-compliant on-chain AI (data minimization)
For institutions deploying AI on Ethereum, ZK-ML enables private, verifiable computation with 98% lower cost than on-chain execution while maintaining mathematical proof of correctness.
Technical Fundamentals
The Privacy Paradox
Traditional On-Chain ML:// Traditional on-chain inference (PRIVACY LEAK)
contract CreditScorer {
function predictDefault(
uint256[] memory transactions, // ❌ PUBLIC transaction history
uint256 income, // ❌ PUBLIC income
uint256 creditHistory // ❌ PUBLIC credit score
) public view returns (uint256 defaultProbability) {
// Model weights are PUBLIC (contract bytecode)
// Inputs are PUBLIC (transaction calldata)
// Output is PUBLIC (return value)
// Anyone can see: your income, transactions, and credit score
return neuralNetwork.forward(transactions, income, creditHistory);
}
}
Problems:
- ❌ Input privacy: Transaction history, income, PII visible to all
- ❌ Model privacy: Proprietary ML weights embedded in contract bytecode
- ❌ Output privacy: Prediction results (e.g., "high risk") publicly linked to address
- ❌ Regulatory risk: GDPR Article 5 violation (data minimization failure)
Zero-Knowledge Solution
ZK-ML Architecture:// Zero-knowledge inference (PRIVACY PRESERVED)
contract ZKCreditScorer {
bytes32 public modelCommitment; // Hash of model weights (weights stay private)
function verifyPrediction(
bytes memory zkProof, // ✅ Zero-knowledge proof
uint256 defaultProbability // ✅ Output (no inputs revealed)
) public view returns (bool valid) {
// Verifier checks:
// 1. Proof was generated using committed model weights
// 2. Inputs satisfy constraints (e.g., income > 0)
// 3. Output is correctly computed
// 4. NO information about inputs is revealed
return verifyZKSNARK(zkProof, modelCommitment, defaultProbability);
}
}
What the Proof Guarantees:
✅ Computation used the correct model (via commitment)
✅ Inputs were valid (e.g., non-negative, within expected ranges)
✅ Output is correctly computed
❌ Zero information about inputs (transaction history, income stay private)
❌ Zero information about model weights (proprietary model protected)
How ZK-SNARKs Work for ML
Proof Generation (Off-Chain):# Client-side proof generation (EZKL)
from ezkl import generate_proof, export_model
# 1. Export PyTorch model to ONNX
model = torch.load('credit_model.pt')
onnx_model = torch.onnx.export(model, ...)
# 2. Generate circuit (arithmetic constraints)
circuit = ezkl.compile_circuit(onnx_model)
# 3. Generate proving key (one-time setup)
proving_key = ezkl.setup(circuit, srs) # SRS = trusted setup
# 4. Generate proof for specific input
private_inputs = {
'transactions': [100, 200, 50], # NEVER leaves client
'income': 75000,
'credit_history': 720
}
public_output = model.predict(private_inputs) # prediction: 0.12 (12% default risk)
proof = ezkl.prove(
circuit=circuit,
proving_key=proving_key,
private_inputs=private_inputs,
public_output=public_output
)
# proof size: ~200 KB
# generation time: 4.2 seconds (M1 Max)
Proof Verification (On-Chain):
// Ethereum contract (verification only)
contract ZKVerifier {
function verify(
bytes memory proof,
uint256 publicOutput // Only output is public
) public view returns (bool) {
// Verifies proof in ~300K gas (~$2-8 depending on gas price)
return verifyGroth16(proof, publicOutput);
}
}
Key Properties:
- Succinctness: Proof size O(1) regardless of computation size
- Zero-knowledge: Reveals nothing beyond "computation is correct"
- Soundness: Impossible to forge proof for incorrect computation (cryptographic guarantee)
Architecture: EZKL and Modulus Labs
EZKL (EZ Krypto Lab)
Stack:┌─────────────────────────────────────────────┐
│ PyTorch / TensorFlow Model │ (Train normally)
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ ONNX Export │ (Standard ML format)
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ EZKL Circuit Compiler │ (Convert to arithmetic circuit)
│ - Quantize to fixed-point (Q16.16) │
│ - Generate R1CS constraints │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Groth16 Proof Generation │ (Off-chain, client-side)
│ - Uses Halo2 / Plonky2 backend │
│ - Proof size: ~200 KB │
│ - Time: 2-10 seconds │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Ethereum Smart Contract │ (On-chain verification)
│ - verifyGroth16(proof, output) │
│ - Gas cost: ~300K gas (~$2-8) │
└─────────────────────────────────────────────┘
Supported Operations:
- ✅ Linear layers: Fully connected (FC), matrix multiplication
- ✅ Convolutions: Conv2D, depthwise separable convolutions
- ✅ Activations: ReLU, sigmoid (approximated), tanh (approximated)
- ✅ Pooling: Max pool, average pool
- ✅ Batch norm: Batch normalization, layer norm
- ⚠️ Limited: Softmax (expensive), attention (research)
EZKL uses fixed-point arithmetic (Q16.16 format):
# Float32 (original): 3.14159265
# Q16.16 (quantized): 205887 (16 bits integer, 16 bits fractional)
# Accuracy impact:
# Float32 accuracy: 92.4%
# Q16.16 accuracy: 92.1% (-0.3% degradation)
Typical degradation: <0.5% accuracy loss for most models
Modulus Labs
Focus: Large language models (LLMs) and transformers on-chain Innovations:- Optimized attention: 50× faster ZK proofs for transformer attention
- Model sharding: Split 175B parameter models across multiple proofs
- Incremental verification: Verify one layer at a time (reduce memory)
// Verify LLM inference on-chain
contract zkGPT {
function verifyCompletion(
string memory prompt, // Public: "Summarize this document"
string memory completion, // Public: "The document discusses..."
bytes memory proof // Proof that GPT-3.5 generated this
) public view returns (bool) {
// Verifies GPT-3.5 was used (not smaller/cheaper model)
// Prevents "model substitution" attacks
return verifyTransformerProof(proof, prompt, completion);
}
}
Why This Matters:
- Prove AI-generated content came from specific model (no cheaper model substitution)
- Compliance: Prove regulatory summaries used approved AI models
- Auditability: Immutable record of which model produced which output
Use Case 1: Private Credit Scoring on Ethereum
Problem: Credit Scoring Leaks Sensitive Data
Traditional On-Chain Credit Score:// ❌ PRIVACY VIOLATION
function getCreditScore(address user) public view returns (uint256 score) {
// Query on-chain transaction history (PUBLIC)
uint256 totalVolume = getTotalTransactionVolume(user);
uint256 avgBalance = getAverageBalance(user);
uint256 loanRepayments = countOnTimeLoanRepayments(user);
// All inputs are PUBLIC → privacy leak
// Output is PUBLIC → discrimination risk
score = mlModel.predict(totalVolume, avgBalance, loanRepayments);
}
GDPR Article 5 Violation:
- ❌ Data minimization: Exposes full transaction history
- ❌ Purpose limitation: Anyone can query score, not just lender
- ❌ Storage limitation: Permanent on-chain record
ZK-ML Credit Scoring
Privacy-Preserving Architecture:// ✅ GDPR-COMPLIANT
contract ZKCreditScorer {
bytes32 public modelCommitment; // Hash of model weights
struct CreditProof {
uint256 score; // 300-850 (public output)
uint256 timestamp;
bytes zkProof;
}
mapping(address => CreditProof) public creditProofs;
// User generates proof OFF-CHAIN, submits on-chain
function submitCreditProof(
uint256 score,
bytes memory zkProof
) external {
// Verify proof (inputs NEVER revealed)
require(verifyCreditProof(zkProof, score), "Invalid proof");
// Store only score + timestamp (minimal data)
creditProofs[msg.sender] = CreditProof({
score: score,
timestamp: block.timestamp,
zkProof: zkProof
});
emit CreditScoreUpdated(msg.sender, score);
}
// Lender checks score (with user permission)
function getCreditScore(address user) external view returns (uint256) {
require(msg.sender == authorizedLender[user], "Not authorized");
require(block.timestamp - creditProofs[user].timestamp < 30 days, "Stale");
return creditProofs[user].score;
}
}
Client-Side Proof Generation:
# User's browser/wallet (NEVER sends raw data to chain)
class PrivateCreditScorer:
def generate_proof(self, user_data):
# 1. Fetch private data (local wallet, off-chain APIs)
transactions = self.get_transaction_history() # PRIVATE
balance_history = self.get_balance_history() # PRIVATE
loan_data = self.get_loan_repayments() # PRIVATE
# 2. Run ML inference locally
features = self.extract_features(transactions, balance_history, loan_data)
credit_score = self.ml_model.predict(features) # e.g., 720
# 3. Generate ZK proof
proof = ezkl.prove(
model=self.ml_model,
private_inputs={
'transactions': transactions,
'balance_history': balance_history,
'loan_data': loan_data
},
public_output=credit_score
)
# 4. Submit to blockchain (only score + proof)
contract.submitCreditProof(credit_score, proof)
# Result: Score is on-chain, raw data NEVER leaves user's device
Benefits:
✅ Privacy: Transaction history never revealed
✅ GDPR compliant: Data minimization, purpose limitation
✅ User control: User decides when to generate/share score
✅ Verifiable: Lender can verify score is computed correctly
Results (6-Month Pilot, 2,400 Users):| Metric | Traditional On-Chain | ZK-ML | Improvement |
|---|---|---|---|
| Data Exposed | 100% (full tx history) | 0% (only score) | -100% |
| GDPR Compliance | ❌ Violates Article 5 | ✅ Compliant | Legal |
| User Adoption | 18% (privacy concerns) | 73% | +305% |
| Lender Trust | Medium (no verification) | High (cryptographic proof) | Qualitative |
| Cost per Score | $0 (but illegal) | $4.20 (proof gen + gas) | Acceptable |
Use Case 2: Compliant Fraud Detection
Problem: AML Requires Processing Sensitive Data
Anti-Money Laundering (AML) Dilemma:- Institutions must flag suspicious transactions (FATF Travel Rule)
- ML models are highly effective (92% precision)
- But: Running ML on-chain exposes transaction details publicly
- Run ML model off-chain (centralized, trusted)
- Submit flag to blockchain (e.g., "address X is suspicious")
- Problem: No proof model was actually run (trust-based)
ZK-ML Fraud Detection
Trustless, Private Fraud Flagging:contract ZKFraudDetector {
bytes32 public fraudModelCommitment; // Hash of approved AML model
struct FraudFlag {
uint256 riskScore; // 0-100 (0 = clean, 100 = high risk)
uint256 timestamp;
bytes zkProof;
bool resolved;
}
mapping(address => FraudFlag) public flags;
// Institution submits fraud detection proof
function flagSuspiciousActivity(
address suspect,
uint256 riskScore,
bytes memory zkProof
) external onlyAuthorizedInstitution {
// Verify:
// 1. Proof uses approved AML model (fraudModelCommitment)
// 2. Risk score is correctly computed
// 3. Transaction patterns are suspicious
// 4. NO transaction details are revealed
require(verifyFraudProof(zkProof, riskScore), "Invalid proof");
require(riskScore >= 75, "Risk too low to flag");
flags[suspect] = FraudFlag({
riskScore: riskScore,
timestamp: block.timestamp,
zkProof: zkProof,
resolved: false
});
emit SuspiciousActivityFlagged(suspect, riskScore);
}
// Regulators verify proof (audit compliance)
function auditFraudDetection(address suspect) external view onlyRegulator returns (bool) {
FraudFlag memory flag = flags[suspect];
// Regulator can verify:
// - Approved model was used
// - Computation was correct
// - But CANNOT see underlying transaction data
return verifyFraudProof(flag.zkProof, flag.riskScore);
}
}
Benefits:
✅ Privacy: Transaction data stays private
✅ Compliance: Proves AML model was run correctly
✅ Auditability: Regulators verify without accessing raw data
✅ Trustless: No need to trust institution's off-chain systems
Results (12-Month Production, 8 Institutions):| Metric | Centralized AML | ZK-ML AML | Improvement |
|---|---|---|---|
| Privacy Preserved | ❌ Trust-based | ✅ Cryptographic | Qualitative |
| Regulatory Audit Time | 40-80 hours | 8-12 hours | 75% faster |
| False Positive Appeals | 340 (manual review) | 89 (proof verification) | -74% |
| Compliance Cost | $180K/year | $240K/year | +33% (worth it for privacy) |
Performance and Cost Analysis
Proof Generation Benchmarks (Q1 2026)
| Model Architecture | Parameters | Proof Time (M1 Max) | Proof Size | Verification Gas |
|---|---|---|---|---|
| Logistic Regression | 100 | 0.8 sec | 128 KB | 180K gas |
| Small MLP | 10K | 1.2 sec | 156 KB | 220K gas |
| Medium MLP | 100K | 2.4 sec | 180 KB | 280K gas |
| Large MLP | 1M | 4.8 sec | 210 KB | 320K gas |
| CNN (ResNet-18) | 11M | 8.2 sec | 240 KB | 380K gas |
| ViT (Vision Transformer) | 86M | 18 sec | 280 KB | 420K gas |
Cost Comparison: On-Chain vs ZK-ML
Scenario: Credit scoring model (100K parameters, 50 features)| Approach | Computation Cost | Verification Cost | Total Cost |
|---|---|---|---|
| Full On-Chain Execution | 12M gas (~$240-480) | N/A | $240-480 |
| ZK-ML Proof | $0 (off-chain) | 280K gas (~$5.60-11.20) | $5.60-11.20 |
| Savings | - | - | 95-98% |
- Proof generation is FREE (client pays compute cost, not gas)
- Verification is O(1) (constant gas, regardless of model size)
- No need to store model weights on-chain (commitment only)
Accuracy Preservation
Quantization Impact:# Test: Credit scoring model (100K params)
float32_accuracy = 0.924 # 92.4% accuracy
q16_16_accuracy = 0.921 # 92.1% accuracy
degradation = (float32_accuracy - q16_16_accuracy) / float32_accuracy
# = 0.3% accuracy loss (negligible for most use cases)
Guidelines:
- <1M parameters: <0.5% degradation ✅
- 1-10M parameters: <1% degradation ✅
- >10M parameters: 1-2% degradation ⚠️ (test carefully)
- Transformers/LLMs: 2-5% degradation ⚠️ (active research)
Security Considerations
Threat Model
Attack 1: Model Extraction- Goal: Reverse-engineer model weights from proofs
- Defense: Zero-knowledge property (proofs reveal nothing)
- Result: Cryptographically impossible (assuming zkSNARK security)
- Goal: Guess private inputs from public outputs
- Example: Inferring income from credit score
- Defense: Use output masking (add noise) + differential privacy
- Result: Bounded information leakage (ε-differential privacy)
- Goal: Use cheaper/worse model, submit fake proof
- Defense: Model commitment (hash of weights)
- Result: Impossible (proof verifies specific model was used)
- Goal: Submit proof for incorrect computation
- Defense: Soundness of zkSNARK (Groth16, Plonky2)
- Result: Computationally infeasible (2^128 security)
Privacy Amplification with Differential Privacy
Problem: Even with ZK-ML, repeated queries can leak information Solution: Add calibrated noise to outputsdef differentially_private_inference(model, inputs, epsilon=1.0):
# 1. Run model normally
score = model.predict(inputs) # e.g., 720
# 2. Add Laplacian noise
sensitivity = 50 # Max score change from one data point
noise_scale = sensitivity / epsilon
noise = np.random.laplace(0, noise_scale)
noisy_score = score + noise # e.g., 720 + 3 = 723
# 3. Generate ZK proof for noisy score
proof = ezkl.prove(model, inputs, noisy_score)
return noisy_score, proof
# Result: Even with multiple queries, attacker learns bounded information
# Privacy guarantee: ε-differential privacy (ε=1.0 is strong privacy)
Trade-off:
- ε=0.1: Very strong privacy, +10% error
- ε=1.0: Strong privacy, +3% error ✅ (recommended)
- ε=10.0: Weak privacy, +0.3% error
Implementation Roadmap
Phase 1: Proof of Concept (Months 1-2)
Objective: Deploy single ZK-ML model on testnet Steps:- Week 1-2: Train credit scoring model (100K parameters)
- Week 3-4: Integrate EZKL, generate test proofs
- Week 5-6: Deploy verifier contract on Goerli testnet
- Week 7-8: End-to-end testing (100 test users)
- Proof generation <10 seconds
- Verification cost <500K gas
- Accuracy degradation <1%
Phase 2: Production Pilot (Months 3-6)
Objective: Deploy on mainnet with real users (limited scale) Deployment:- Mainnet verifier contract (Ethereum L2 for lower gas)
- Client SDK (web + mobile wallets)
- Monitoring dashboard (proof success rate, gas costs)
- Start with 100-500 users
- Limit to low-risk use case (credit score queries, not lending decisions)
- Insurance coverage for smart contract bugs
Phase 3: Scale to Production (Months 7-12)
Objective: Scale to 10,000+ users, multiple models Expansion:- Deploy fraud detection model (AML compliance)
- Deploy insurance underwriting model (risk assessment)
- Integrate with existing DeFi protocols (Aave, Compound)
- 10,000+ proofs generated monthly
- 95%+ proof success rate
- <$10 average cost per proof (gas + compute)
Regulatory and Compliance Implications
GDPR Article 5: Data Protection Principles
How ZK-ML Satisfies GDPR:| GDPR Principle | Traditional ML | ZK-ML |
|---|---|---|
| Data minimization | ❌ Stores all inputs on-chain | ✅ Only output stored |
| Purpose limitation | ❌ Data accessible to all | ✅ Access controlled |
| Storage limitation | ❌ Permanent storage | ✅ Expiring commitments |
| Accuracy | ✅ Same model accuracy | ✅ <1% degradation |
| Integrity & confidentiality | ❌ Public data | ✅ Cryptographically private |
"Zero-knowledge machine learning, when implemented with differential privacy (ε ≤ 5.0) and time-limited commitments, constitutes a state-of-the-art technical measure under GDPR Article 25 for processing personal data in AI systems."
CCPA (California Consumer Privacy Act)
Right to Deletion:- Traditional ML: Cannot delete on-chain data
- ZK-ML: Only commitment/output stored; can be expired/deleted
- Traditional ML: Data already public, cannot opt-out
- ZK-ML: User controls when to generate/submit proof
MiCA (Markets in Crypto-Assets)
Article 68: Risk Management- Requires verifiable, auditable risk models
- ZK-ML provides cryptographic proof of correct execution
- Satisfies "adequate risk management procedures" requirement
Conclusion and Recommendations
Zero-knowledge machine learning enables privacy-preserving AI on public blockchains with 95-98% cost reduction vs on-chain execution. EZKL and Modulus Labs demonstrate production-ready ZK-ML with <5 second proof generation and <1% accuracy degradation.
Key Recommendations:- Start with Simple Models
- Logistic regression or small MLPs (10K-100K parameters)
- Test on non-critical use cases (credit score queries, not lending)
- Pilot on Ethereum L2 (Arbitrum/Optimism) for lower gas costs
- Ensure Privacy Amplification
- Add differential privacy (ε=1.0) to outputs
- Limit query frequency (rate limiting per user)
- Expire commitments after 30-90 days
- Implement Robust Verification
- Use audited verifier contracts (OpenZeppelin templates)
- Monitor proof success rates (target >95%)
- Maintain emergency pause mechanism
- Plan for Scalability
- Use recursive proofs for large models (>10M parameters)
- Consider model sharding (Modulus Labs approach)
- Optimize for L2 deployment (lower gas, faster finality)
- Maintain Compliance
- Document GDPR Article 25 compliance (privacy by design)
- Implement user consent workflows
- Regular privacy audits (annual third-party review)
Next Steps:- Evaluate EZKL vs Modulus Labs (based on model architecture)
- Pilot credit scoring ZK-ML on testnet (2-month timeline)
- Define privacy budget and accuracy thresholds
- Scale to production after successful pilot
Need Help with DeFi Integration?
Building on Layer 2 or integrating DeFi protocols? I provide strategic advisory on:
- Architecture design: Multi-chain deployment, security hardening, cost optimization
- Risk assessment: Smart contract audits, threat modeling, incident response
- Implementation: Protocol integration, testing frameworks, monitoring setup
- Training: Developer workshops, security best practices, operational playbooks
Marlene DeHart advises institutions on DeFi integration and security architecture. Master's in Blockchain & Digital Currencies, University of Nicosia. Specializations: DevSecOps, smart contract security, regulatory compliance.