ML Models
BurstPick ships with 22 ML models across 6 categories. Every model runs on-device using Apple CoreML and Neural Engine. No photos leave your Mac. Models are either bundled with the app or downloaded when you first select them.
Image Quality Assessment
Scores each photo on sharpness, noise, exposure, and perceptual clarity. Used to rank photos within burst clusters.
Heuristic (Laplacian + Luma)
BundledInstant scoring using Accelerate vDSP/vImage. Measures sharpness, exposure, noise, and eye closure. Cannot judge composition or semantic quality. Best for fast initial culling passes.
TOPIQ NR
No-reference IQA using ResNet50 backbone. Good general-purpose technical quality scores. Good balanced option for quality assessment.
MUSIQ (KonIQ)
Multi-scale transformer trained on KonIQ-10k real-world distortions. Strong on natural photos with better perceptual alignment than TOPIQ.
MANIQA
NTIRE 2022 IQA Challenge winner. Multi-dimension attention captures fine perceptual differences. Most accurate but largest in category.
NIMA (MobileNet)
Neural Image Assessment trained on AVA (250K aesthetic ratings). Outputs 10-class probability distribution. Compact MobileNet backbone — fastest aesthetic quality model.
Aesthetic Scoring
Rates artistic appeal based on composition, color harmony, and visual balance. Trained on large-scale human preference data.
LAION Aesthetic v1
BundledLightweight linear probe on CLIP embeddings — near-zero overhead if CLIP is loaded. Trained on LAION aesthetic ratings. Good default aesthetic scorer.
ViT-B/16 Aesthetic
Standalone ViT-B/16 fine-tuned on AVA dataset (250K human aesthetic ratings). More nuanced aesthetic judgment than LAION probe. Independent of CLIP.
Image Embedding
Turns photos into vectors for similarity clustering. Groups burst sequences and flags duplicates or near-duplicates.
Apple Vision FeaturePrint
BundledBuilt into macOS — zero download, instant availability. Good general-purpose scene similarity. Best for speed-first workflows.
DINOv2 ViT-S/14
State-of-the-art self-supervised features (Meta, LVD-142M). Excellent visual similarity and scene structure. Recommended balanced choice.
CLIP ViT-B/32
Rich semantic understanding from multimodal training. Groups photos by content meaning. Required by LAION Aesthetic scorer. Best for diverse libraries.
Face Embedding
Builds face identity vectors for person grouping. Clusters photos by who appears in them.
EdgeFace-XS
Fastest option — lightweight 4 MB download. Good face grouping for most photos (LFW 99.73%). Best when speed is the priority.
EdgeFace-S
Good balance of speed and accuracy (LFW 99.82%, IJB-B 94.38%). Small download. Handles varied lighting well. Recommended balanced choice.
AdaFace IR-18
BundledStrong on low-quality and challenging face crops via adaptive margin (CVPR 2022). LFW 99.82%. Good mid-tier choice.
AdaFace IR-50
Top-tier accuracy (LFW 99.82%, IJB-B 95.67%). Excels on difficult poses and low-quality crops. Best when face grouping precision is critical.
AuraFace v1
Large ResNet-100 backbone with permissive Apache 2.0 license. Choose mainly for licensing requirements.
GhostFaceNets
SOTA 2025 lightweight face recognition model. High performance with minimal computational overhead.
Vision Language Model (VLM)
Reads photo content using natural language. Gives scene descriptions and quality reasoning that go beyond numerical scores.
Heuristic Estimate
BundledBuilt-in fallback using heuristic image analysis (sharpness, exposure, noise, faces). No download required. Replace with a real VLM for improved results.
SmolVLM2 256M
Smallest VLM — fastest inference with minimal memory. Basic scene recognition and quality commentary. Best for quick screening on constrained hardware.
SmolVLM2 2.2B
Full-size SmolVLM with strong scene understanding and quality reasoning. More capable but slower than 256M variant.
FastVLM 0.5B
Apple FastVLM with FastViTHD hybrid encoder. Optimized for on-device speed with solid scene recognition. Recommended balanced VLM choice.
FastVLM 1.5B
Largest and most capable VLM. Deep scene understanding, nuanced quality reasoning, and detailed descriptions. Best when VLM quality is the top priority.
Image Classification
Tags photos with scene and object labels for filtering and organization. Uses Apple's Vision framework.
Apple Vision Classification
BundledBuilt-in macOS image classification using VNClassifyImageRequest. Fast, no download required. Provides scene and object tags for filtering.
