For decades, video compression has been a game of smart removal — deleting what the human eye won't miss. From MPEG-2 to H.264 to AV1, every generation has been about the same goal: keep what matters, hide what doesn't.
But we've hit the limit.
Compression Isn't Getting Smarter — It's Getting Softer
The upcoming AV2 codec is a perfect example. On paper, it claims up to 40% better compression than AV1. In reality, most of that "gain" comes from post-filters that blur fine details to please the metrics. Sharp edges, textures, and grain — all softened by denoising filters and restoration passes.
It looks cleaner to an algorithm. It looks softer to a human.
We're no longer improving compression; we're improving the illusion of quality.
From Blocks to Brains: We've Already Mined the Math
Old codecs worked in simple blocks. 8×8 macroblocks were once huge. Then came 16×16, and now we have adaptive block trees, quarter-pixel motion vectors, and P/B-frame pyramids with recursive prediction. The math has gotten incredible — but the gains are shrinking.
We've squeezed every drop out of DCTs, transforms, and entropy coding. Every new codec claims another 10–15% improvement — but at massive computational cost and often at the expense of detail.
The Codec Evolution Timeline
- MPEG-2 (1995): 8×8 blocks, simple DCT, ~6 Mbps for broadcast quality
- H.264 (2003): Adaptive blocks, P/B frames, CABAC entropy coding, ~3 Mbps for same quality
- H.265/HEVC (2013): 64×64 CTUs, 35 intra modes, ~1.5 Mbps for same quality
- AV1 (2018): 128×128 superblocks, compound prediction, ~1 Mbps for same quality
- AV2/VVC (2024+): Neural filters, machine learning tools, ~0.7 Mbps... but at what visual cost?
At this point, compression isn't about math anymore. It's about psychology.
Axis and the Art of Not Sending Video
Axis Communications saw this years ago. Instead of focusing on better transforms, they focused on not sending data that doesn't matter.
Their motion-based "Zipstream" idea was simple genius: only transmit full frames when there's real movement. The rest is stillness, and stillness compresses infinitely well.
The Zipstream Insight
Traditional codecs treat every second of video as equally important. Zipstream recognizes that surveillance video is 90% static scenes with occasional motion events.
Why send full bitrate to encode a static parking lot?
The innovation isn't better compression — it's contextual awareness. Send high quality when motion is detected, drop to minimal bitrate when nothing changes.
That concept has since spread across the entire industry — everyone from Hikvision to Dahua now calls it "smart codec," but the core idea remains: don't waste bandwidth on static pixels.
This kind of temporal intelligence is the real future. It's not about another codec; it's about deciding when to send video, not just how to compress it.
The Frame Rate Myth
Another industry obsession: frame rate. People assume 60 fps means "better." For live streaming or sports, maybe. For surveillance? Not really.
If you lock your bitrate at 3 Mbps and jump from 25 fps to 60 fps, you're cutting the bits per frame by more than half. You end up compressing motion blur itself — sharper movement, worse image.
The Frame Rate Reality Check
- Most human visual processing is optimized for 24–30 fps
- Real-world evidence and analytics systems perform perfectly at 20–25 fps
- Doubling frame rate at constant bitrate means half the bits per frame
- Higher frame rate = more motion blur compression = worse per-frame quality
- License plate recognition, facial detail, and forensic clarity suffer at high fps/low bitrate
More frames don't mean more clarity — they mean less data per frame.
Security Cameras: The 15 FPS Advantage
For security cameras with bandwidth considerations, you're often better off using 15 fps instead of 30 fps. Here's why:
15 quality frames beats 30 low-quality frames. With an I-frame interval around once per second, you get good usable frames of data at regular intervals. When you need to go back and look for a license plate or get a clean picture of a person's face, that lower frame rate with higher per-frame quality gives you far better forensic data.
At 30 fps with limited bandwidth, you're spreading your bitrate too thin. Motion might look smoother during live playback, but the quality of each individual frame tends to go down. When it matters most — during forensic analysis — you want sharp, detailed frames, not smooth motion.
This is where these new perception-based filters will really shine. They're specifically designed to account for things that need to be kept: fine details, faces, text, edges. A perception-aware encoder at 15 fps could preserve license plate clarity and facial features far better than a traditional encoder at 30 fps with the same bandwidth budget.
The industry has been chasing higher frame rates for the wrong reasons. For surveillance and forensics, frame quality beats frame rate every single time.
The Next Chapter: Perception-Based Compression
We've mastered compression. The next frontier is perception.
Codecs will start to prioritize faces, license plates, edges, and text — regions the human eye or AI models actually care about. The rest can fade into softness.
Future encoders won't just predict pixels; they'll predict attention.
What Perception-Based Encoding Looks Like
- Region-of-interest encoding: Detected faces get 3× the bitrate of background walls
- Saliency maps: AI determines where the human eye will look and allocates bits accordingly
- Semantic segmentation: License plates, text, and edges preserved; sky and pavement softened
- Contextual adaptation: Parking lot at night uses different priorities than a busy intersection at noon
That's where true efficiency lies — not in another 5% bitrate savings, but in understanding what matters visually and discarding the rest intelligently.
Why H.264 Still Wins
With all this talk of next-generation codecs, there's an inconvenient truth: H.264 is still the best choice for most real-world deployments.
Not because it's technically superior. But because:
- Universal compatibility: Every device made since 2005 can decode H.264 in hardware
- Error resilience: H.264 gracefully handles packet loss on wireless/cellular networks
- Known behavior: 20+ years of field experience means predictable performance
- Low encoding latency: Critical for live surveillance and two-way communication
- Transparent bitrate control: You get what you configure — no surprises
H.265 and AV1 promise better compression, but in typical surveillance scenarios (400-800 Kbps bitrates, wireless links, variable packet loss), the gains evaporate. You end up fighting codec quirks, decoder compatibility, and unpredictable quality.
H.264 just works. And in a world where "smarter" codecs keep getting softer, that reliability is worth more than benchmark scores.
The Backport Opportunity
Here's where it gets interesting: many of these new post-processing filters and perceptual optimizations being developed for AV2 and VVC aren't fundamentally tied to those codecs. They're preprocessing and postprocessing techniques that could theoretically be backported to H.264 and H.265.
We might soon see movement in the open-source encoder libraries that have been relatively stable for years:
The Open Source Encoder Landscape
x264 (libx264) — VideoLAN's H.264 encoder, the gold standard for software encoding. While still maintained with periodic updates, the core algorithm has been largely stable since the mid-2010s as H.264 reached maturity.
x265 (libx265) — MulticoreWare's H.265/HEVC encoder. Development has slowed considerably in recent years as focus shifted to newer codecs, though critical updates still occur.
Both projects could benefit enormously from incorporating modern perception-based optimizations — saliency-aware bit allocation, AI-driven preprocessing, and context-sensitive encoding decisions — without touching the core codec spec.
The beauty of this approach? You get the perceptual improvements of next-gen codecs while maintaining the universal compatibility and reliability of H.264/H.265. No new decoders required. No compatibility headaches. Just smarter encoding of a proven format.
If the industry is smart, we'll see these techniques trickle down to the encoders everyone actually uses, rather than being locked inside codecs nobody can decode.
WINK's Perspective
At WINK, we don't chase benchmark scores. We care about perceived quality, reliability, and latency.
Our philosophy has always been simple: compression should never compromise visibility. A frame that arrives late or loses detail isn't worth sending at all.
Our Engineering Principles
- Compatibility over cleverness: We use codecs that work everywhere, not just in lab conditions
- Reliability over ratios: A stream that stays connected is better than one that's 10% smaller
- Transparency over magic: Operators should understand what's happening, not trust black-box algorithms
- Field-tested over theoretical: We trust what performs in real networks with real packet loss
The industry has reached the end of mathematical compression. What comes next will be contextual, perceptual, and situational.
We've spent 20 years getting smarter about what to remove.
Now it's time to get smarter about what to keep.
The Future of Video Encoding
The compression wars are over. The winners are:
- Context-aware encoding that knows when motion matters
- Perception-based allocation that prioritizes what humans actually see
- Semantic understanding that treats faces differently than walls
- Reliability and compatibility over theoretical compression ratios
- Knowing when not to send video is as important as knowing how to compress it
The next decade won't be defined by AV2 or H.266. It'll be defined by systems that understand their content and adapt intelligently.
Related Topics
- Why H.264 Is Almost Always The Answer — Technical analysis of codec selection for field deployments
- WINK Forge — Our cloud-based transcoding platform
- WINK Analytics — Perception-based video analytics with 10+ years AI experience