Perceptual Hashing, Explained: PDQ, vpdq, and What Instagram & TikTok Actually Match
By The b10.studio team
Every time a platform tells you a video is a "duplicate," flags a copyright match, or quietly demotes a repost, the same family of technology is underneath: perceptual hashing. If you work in social, ads, or content at scale, it's worth understanding properly — not as a black box, but as a measurable number you can reason about. This is that explainer.
Cryptographic hash vs perceptual hash
Start with the contrast, because it's the whole idea.
A cryptographic hash (SHA-256 and friends) is designed so that the tiniest change to the input — one flipped bit — produces a completely different output. That's perfect for "is this the exact same file?" and useless for "is this the same content?" Re-save a JPEG and its SHA-256 changes entirely, even though the image looks identical.
A perceptual hash is designed for the opposite property: inputs that look or sound similar should produce similar outputs. Small, perceptually-minor changes move the hash only a little. That's what lets a platform recognize your video after it's been re-compressed, resized, and trimmed.
How an image perceptual hash is built (PDQ)
PDQ is Meta's open-source image hashing algorithm — widely used and a good model for the whole family. Conceptually:
- Normalize. Downscale the image and convert to luminance (grayscale). Fine detail and color are thrown away on purpose — they're the things that change between near-duplicates.
- Transform to frequency space. A Discrete Cosine Transform (DCT) separates broad structure (low frequencies) from fine detail (high frequencies). The broad structure is what survives compression and resizing.
- Quantize to bits. The low-frequency coefficients are reduced to a fixed-length binary string — for PDQ, a 256-bit fingerprint.
The result is a compact bit-string that captures the gist of the image. Two visually similar images land on two similar bit-strings.
Comparing hashes: Hamming distance
You don't compare perceptual hashes for equality — you measure how close they are. The metric is Hamming distance: the number of bit positions where two hashes differ.
- Distance 0 → bit-identical fingerprints.
- A small distance → the platform treats them as the same content (a match).
- A large distance → treated as distinct.
Each platform picks a threshold. Below it, you're a duplicate; above it, you're new. The exact number is proprietary and varies by platform and surface, but the model is universal: duplicate detection is "is the Hamming distance below T?" Your entire job, when repurposing, is to push that distance above T while keeping the creative intact.
Video: vpdq and the time dimension
Video adds time. vpdq (video PDQ, also from Meta) handles it the obvious way: sample frames over the duration, compute a PDQ hash per sampled frame, and represent the video as the set of those frame hashes.
Matching then becomes "what fraction of frames in clip A have a close match somewhere in clip B?" This is why several naive edits fail to fool it:
- Trimming the ends removes some frames but leaves most matching — the shared middle still scores as a match.
- Re-ordering or speed changes shift frames around but don't change what most of them look like.
- Re-compression moves each frame's hash only slightly, well within threshold.
To genuinely move a video's fingerprint you have to shift the per-frame hashes across the whole timeline — which means distributed, frame-level visual change, not one edit at the boundaries.
Audio gets fingerprinted too
Visual hashing is only half of it. Audio has its own perceptual fingerprinting (Chromaprint/AcoustID-style), built from a spectrogram of the sound rather than the waveform bytes. Identical audio is a strong, independent duplicate signal — which is why two visually-distinct edits that share the exact same audio track can still get linked. A complete repost strategy has to account for both fingerprints.
Why the naive tricks don't work
Run the common "make it look new" tactics against the model and they fall apart predictably:
- "I changed the format / re-encoded it." Re-encoding changes bytes, not normalized pixels. The hash barely moves.
- "I resized it / changed resolution." PDQ downscales first — your resolution is normalized away before hashing.
- "I cropped a little / trimmed the intro." Small crops survive normalization; trims leave most frames matching.
- "I wiped the metadata." The hash is built from pixels and audio. Metadata isn't an input.
All four leave the Hamming distance well below threshold. They feel like meaningful changes because they change the file; they don't change the fingerprint.
What does move the distance
Distributed, perceptually-small changes that touch the normalized signal in many places at once:
- Color grade, gamma, and saturation shifts (alter luminance across every frame)
- Fractional geometric transforms — rotation, zoom, warp (shift structure in frequency space)
- Structured noise and grain (perturb the coefficients PDQ quantizes)
- Audio re-timing, pitch/tempo nudges, and EQ (move the acoustic fingerprint)
The craft is calibration: enough to clear the threshold, gentle enough that fidelity — how intact the creative looks — stays high.
Measure it instead of guessing
The good news about all of this being a number is that you can measure it directly. Our Risk Analyzer runs real PDQ on images and vpdq on video, computes the Hamming distance between a source and a variant, adds an alignment-robust SSIM fidelity score and a Chromaprint audio match — so you can see exactly where a file sits relative to a match threshold before you ever post it.
Perceptual hashing isn't magic and it isn't unbeatable. It's a well-defined distance metric with a threshold. Once you see it that way, "will this repost get caught?" stops being a superstition and becomes something you can check.
Our Risk Analyzer runs real PDQ / vpdq on any file you give it and reports the Hamming distance — the same number the platforms threshold on. It’s free.
Try the Risk Analyzer