On-Device AI Analysis with Apple Intelligence

How we added AI-powered report analysis to CodeFrog’s Mega Report using Apple’s Foundation Models framework — processing everything on-device with zero data leaving your Mac.

February 2026 Flutter / Dart / Swift macOS 26+ Apple Intelligence
19
Test Sections Analyzed
0 bytes
Data Sent to Cloud
30
Max Findings per Section
~4096
Token Context Window

The Challenge

CodeFrog’s Mega Report runs 19 test categories in parallel — accessibility, security headers, SEO, HTML validation, broken links, secrets detection, supply chain vulnerabilities, and more. A typical scan generates dozens or hundreds of findings across these categories, each with different severity levels.

Users needed prioritized, actionable guidance: which section should I fix first? What are the highest-impact changes? Without this, developers would often stare at a wall of findings without a clear starting point.

The obvious solution — sending findings to a cloud AI service like OpenAI or Anthropic — was unacceptable. Mega Report scans can contain detected secrets (API keys, tokens), security vulnerability details, internal URLs, and source code metadata. For a security scanning tool, sending this data to a third-party cloud service would undermine the very trust the tool is designed to build.

Why On-Device AI

We evaluated cloud AI services, self-hosted models, and on-device inference. On-device AI via Apple Intelligence was the clear winner for our use case:

The Approach

ArchitectureFoundation Models Integration

We integrated Apple’s Foundation Models framework via a Flutter plugin (foundation_models_framework) that bridges Dart to the native Swift API using Pigeon-generated method channels. The plugin provides availability checking, single-prompt requests, and streaming responses.

At widget initialization, the app checks whether Apple Intelligence is available on the current device. If unavailable (older macOS, Apple Intelligence disabled, non-Apple hardware), the AI buttons are disabled with a descriptive tooltip explaining why.

Mode 1Overall Score Improvement Plan

The overall mode analyzes severity counts across all completed sections to generate a prioritized improvement plan. It never sees individual findings — only aggregated counts.

Sections are sorted by a severity weight formula before being sent to the model, ensuring the AI focuses on the most critical areas first:

// Sort sections by severity weight (worst first) int severityWeight(Map<Severity, int> counts) { return (counts[critical] ?? 0) * 1000 + (counts[high] ?? 0) * 100 + (counts[medium] ?? 0) * 10 + (counts[low] ?? 0); }

The prompt includes the current grade letter, the worst-performing section, and exact severity counts per section. The system instruction enforces factual analysis: “Reference the exact section names and severity counts provided. Do not give generic advice.”

Mode 2Section-Specific Fix Suggestions

The section mode analyzes individual findings within a specific test category. Each finding is compressed into a one-liner format that maximizes information density within the token budget:

// Each section type has a specialized formatter // Accessibility: [HIGH] Color contrast insufficient (12 nodes) // Security: [HIGH] missing-csp: Add Content-Security-Policy header // SEO: [MEDIUM] Missing meta description: No meta description found // HTML: [WARNING] Element “div” not allowed as child of “ul” (line 47) // Gitleaks: [HIGH] aws-access-key at src/config.js:23 // OSV: [CRITICAL] lodash@4.17.20: GHSA-jf85-cpcp-j695

Findings are capped at 30 per section. The prompt instructs the model to explain each specific finding, how to fix it, and its severity impact — and to reference only the provided findings, never inventing issues.

ConstraintToken Budget Management

The on-device model has a ~4096 token context window — significantly smaller than cloud models. We designed the entire prompt strategy around this constraint:

Privacy by Design

Privacy is not a feature we added — it is a constraint we designed around from the start. Here is exactly what the AI model processes:

What the AI Sees

What the AI Never Sees

No Persistence

AI suggestions are held in widget state only. They are not written to the database, not included in report exports, and not cached between sessions. Every click of the button generates a fresh analysis.

Results

Lessons Learned

← Back to Case Studies