Cloud, P2P, and Low-Latency Video: Why Mobile Surveillance Feels More Real-Time

Why Latency Became the Key Metric in Video Surveillance

Ten years ago, video surveillance was measured by different standards: resolution, field of view, frame rate, archive depth. Latency barely mattered, because users were either watching recordings or a local live stream within a single network. The camera, recorder, and monitor all lived in the same building, on the same switch, in the same physical reality. The internet, if present at all, played a secondary role.

Once video surveillance moved to the cloud, the rules changed. Cameras went online, users moved to mobile devices, and networks became unpredictable. LTE, 5G, public Wi-Fi, corporate proxies, and carrier-grade NAT turned live video delivery into a problem where every second of delay mattered. When a security guard or business owner opens a camera feed in a mobile app, they expect to see what is happening now—not what happened ten seconds ago. At that moment, video surveillance stops being “recording” and becomes an interactive interface to the real world.

This is where classic protocols began to crack. RTSP, designed for local networks, proved ill-suited for the internet and NAT traversal. HLS, perfect for large-scale video streaming, turned out to be too slow for live scenarios. RTMP, once the low-latency standard, died along with Flash and failed to integrate into the modern web stack. The industry found itself searching for a new balance between speed, resilience, and control.

SRT did not emerge as a revolution, but as an answer to a very specific engineering need. It was not designed for video surveillance, yet surveillance turned out to be one of the domains where SRT’s properties matched real-world requirements almost perfectly. Low latency, UDP-based transport, resilience to packet loss, built-in encryption, and no hard dependency on the browser made SRT a natural fit for mobile apps and operator consoles. To understand why, we first need to understand what SRT actually is.

SRT Without Myths: UDP, Reliability, and Time Control

Secure Reliable Transport is often described as “UDP with brains,” and there is some truth to that. At its core, SRT is built on UDP—a protocol that guarantees neither delivery, nor order, nor integrity of packets, but offers minimal latency and maximum throughput. That is why UDP has been used for decades in real-time systems, from VoIP to video conferencing. Pure UDP, however, is too fragile for the internet, where packet loss and jitter are the norm.

SRT solves this problem differently from TCP. It does not try to deliver every byte at all costs. Instead, it introduces the concept of a time window. The protocol knows how much time it is allowed to spend recovering a lost packet. If a packet is not delivered and acknowledged within the configured latency window, it is considered lost forever, and the video moves on. As a result, the system does not “stall” or accumulate delay the way TCP connections do on poor networks.

This is the fundamental difference - and the key to understanding why SRT works so well for video surveillance. Video is a stream where continuity matters more than absolute precision. Losing a few packets may cause artifacts, but losing time destroys the very idea of live monitoring. SRT allows developers to define this balance explicitly: increase latency to improve resilience, or reduce it to achieve minimal delay. This control is especially important in mobile networks, where link quality can change from one second to the next.

SRT also includes encryption by design. This is not an add-on or an optional TLS layer on top of something else—it is part of the protocol itself. For video surveillance, where video almost always contains personal data, this is critical. In a world where cameras watch streets, offices, entrances, and private homes, sending unencrypted video is simply unacceptable. SRT solves this without complex RTSP overlays or proprietary hacks.

It is also important to understand what SRT is not. It is not a codec, not a container, and not a player. SRT has no idea what H.264 or H.265 is - it just transports bytes. In real-world surveillance systems, SRT is most often used to carry an MPEG-TS stream containing H.264 or H.265 video. This makes it compatible with the existing camera and encoder ecosystem, without requiring radical changes at the video source.

Things get even more interesting when SRT meets the concept of P2P—a term that, in video surveillance, has long been more marketing than engineering.

The P2P That Doesn’t Exist: How Cloud Cameras Actually Connect

If you believe marketing brochures, most cloud cameras work via P2P. The camera supposedly connects directly to the user’s phone, bypassing servers, clouds, and intermediaries. It sounds great—but has little to do with reality. In the real internet, cameras are almost always behind NAT, often multiple layers of NAT, while mobile devices sit behind carrier-grade NAT. Under these conditions, direct connections are only possible in a limited number of scenarios and with many caveats.

In practice, cloud video surveillance architectures almost always include a server. Sometimes it is used only for signaling and authentication; sometimes it performs full video relay. In most cases, the system tries to establish a direct connection between the camera and the client, but at the first sign of trouble it falls back to a server relay. The user still sees a “P2P” interface, even though the video is actually flowing through the cloud.

SRT fits this model perfectly. When used with a server intermediary, it does not require complex ICE logic like WebRTC. The camera or edge server publishes a stream via SRT, and clients connect to it in play mode. The connection is almost always initiated by the client, which is critical for NAT traversal. The server operates in listener mode, accepting incoming UDP connections. This is a simple, robust scheme that scales well when a single camera has anywhere from one to ten viewers.

This approach is not a “cheat” or a compromise - it is a conscious engineering choice. Fully serverless P2P does not scale well in video surveillance, is hard to debug, and is unstable in mass-market deployments. Even a minimal server provides control, security, and centralized access management. In this architecture, SRT becomes the transport layer between server and client, not a magical way to bypass all network constraints.

At this point, it becomes clear why mobile apps consistently beat browsers in terms of latency. The difference lies not only in the protocol, but in the entire delivery model.

Why Mobile Apps Feel “More Live” Than Browsers: Architecture Matters

When a user opens a camera feed in a browser, they are almost always dealing with the HTML5 video element. This element supports a limited set of protocols and formats, the most important of which is HLS. HLS was designed for resilient video delivery over HTTP. It scales extremely well, is easy to cache, and works beautifully with CDNs. But this universality comes at the cost of latency.

HLS splits video into segments that the client downloads over HTTP. The player keeps several segments buffered to smooth out network fluctuations. This means there is always a lag between real time and what the user sees. Even with aggressive tuning, it rarely drops below a few seconds. For movies or live broadcasts, that is fine. For video surveillance, it is critical.

Mobile apps live in a completely different world. They are not constrained by the browser stack and can use native video playback libraries. On Android, this is often libVLC or FFmpeg-based players that can work directly with UDP, SRT, and RTSP. These players allow developers to control buffering precisely, define exact latency windows, and choose what to sacrifice—stability or delay.

Mobile apps also have more direct access to the operating system’s networking stack. They can adapt better to mobile network quirks, react faster to changes in link quality, and use optimizations that are simply unavailable to browsers. Combined with SRT, this results in a clear latency advantage. In real surveillance systems, camera-to-smartphone delay with SRT often falls in the one-to-two second range—close to the practical limit without complex bidirectional protocols like WebRTC.

This is not because browsers are “bad” or “slow.” They solve a different problem. The browser stack is optimized for security, compatibility, and massive scale, not for low-level network control. Mobile apps, by contrast, can afford to be more specialized and aggressive in their tuning. That is why the surveillance industry increasingly uses different protocols for different clients.

VSaaS Architecture: Two Protocols, One User Experience

Modern VSaaS platforms rarely bet on a single video delivery protocol. Instead, they build layered architectures where each client receives video in the format best suited to its capabilities and constraints. A typical setup includes cameras, a cloud backend, a media layer, and client applications.

Cameras usually continue to deliver video via RTSP. It is a proven, widely supported protocol that works well within local networks and between camera and server. The stream then reaches an edge or cloud server, which handles authentication, access control, connection accounting, and—when needed—video relay. This is where the protocol choice for the client is made.

For mobile apps, the server typically offers SRT or WebRTC. SRT is chosen when simplicity, predictability, and explicit latency control matter. The client connects via SRT, receives a minimally buffered stream, and sees live video almost in real time. For browsers, the server offers HLS, sometimes in a low-latency configuration. This ensures compatibility with virtually any device and allows the system to scale to thousands of users via CDN.

Crucially, this complexity is invisible to the user. They open a camera in a mobile app or a browser and see video. Differences in protocols, buffers, and latency are hidden inside the architecture. This is what a mature, industrial-grade approach looks like: acknowledge platform limitations and use their strengths, rather than forcing a one-size-fits-all solution.

In this setup, SRT occupies a clearly defined niche. It does not replace HLS—it complements it. It does not try to be universal—it solves a specific problem: low-latency live video delivery to controlled clients. That is why SRT has taken root so well in mobile video surveillance applications.

The Future of Live Video

SRT is sometimes seen as a temporary trend or a niche solution. But viewed in the context of video surveillance evolution, it is a natural step. The industry has moved from local systems to cloud platforms, from monitors to mobile apps, from archives to real-time live interfaces. At each stage, video delivery requirements changed, and SRT turned out to be the tool that best matches current user expectations.

This does not mean SRT will replace all other protocols. HLS will remain the foundation for browsers and mass access. WebRTC will be used where bidirectional communication and ultra-low latency are required at any cost. RTSP will continue to live inside cameras and local networks. But SRT has secured a stable position between these worlds, offering an optimal balance for mobile and operator scenarios.

The key lesson is simple: there is no single “correct” protocol in modern video surveillance. There is architecture, where each protocol is used where it makes the most sense. SRT is not a magic wand and not “true P2P.” It is a reliable transport that, when integrated into a well-designed VSaaS architecture, brings live video as close to real time as public networks realistically allow.

That is why mobile apps will always feel more “live” than browsers, why P2P cameras almost always imply the presence of a cloud, and why SRT today is seen not as an experiment, but as a practical, working tool of the modern video surveillance industry.