Back to Blog

Browser-Based Remote Desktop on Windows and Linux

Christopher 5 min read
remote-desktopwebcodecswaylandcross-platform

Remote desktop in an RMM tool gets demoed in the first ten seconds of every sales call. The basic bar: "I clicked a button and now I can see the screen." Most products clear that bar. The interesting question is what happens after, when the technician is actually using it for an hour while diagnosing a problem.

The Linux side of ET Ducky's remote desktop landed in stages. The Windows side started as a Microsoft RDP relay and has since moved to a kernel-level DXGI capture path through a signed helper process. As of May 2026, both operating systems are first-class, both share the same browser-side viewer, and Windows additionally captures the secure desktop so technicians can see and click through UAC prompts without anyone touching the target machine.

Three capture backends, one viewer

The viewer is one piece of code in the dashboard. The agent picks a capture backend appropriate to what the host can do, and the viewer adapts.

OSCapture backendWhen it is used
Windows (user desktop) DXGI Desktop Duplication via signed helper Default. The agent service launches a separately-signed helper process into the interactive user's session. The helper uses DXGI Desktop Duplication for GPU-assisted capture and ships frames over the same authenticated WebSocket the dashboard uses. Hardware-accelerated, low CPU overhead.
Windows (secure desktop) GDI BitBlt against Winlogon Engages automatically while a UAC prompt, Ctrl+Alt+Del screen, or lock screen is on display. The helper switches its thread to the Winlogon desktop, captures via GDI BitBlt, and overlays the cursor manually. Slower than DXGI but works deterministically across GPU drivers and lets the operator interact with UAC from the viewer.
Linux (Wayland) xdg-desktop-portal RemoteDesktop + PipeWire Modern Wayland sessions on GNOME, KDE Plasma, or any compositor that implements the portal interface. PipeWire gives the agent a stream of frames from the compositor's own buffers. Input injection rides the same portal interface.
Linux (X11) x11vnc + noVNC Older Linux desktops or hosts that have not migrated to Wayland. The agent spawns x11vnc against the logged-in user's $XAUTHORITY, resolved at runtime via loginctl, and bridges the RFB stream to the dashboard.

The agent picks the backend at session start. The dashboard does not need to know or care; the viewer treats whatever arrives the same way.

H.264 over WebCodecs

JPEG-frame remote-desktop tools did poorly at using available hardware on the host. A Ryzen workstation with VA-API encoding, an Intel chip with QuickSync, or an NVIDIA card with NVENC has dedicated silicon for H.264 encoding. These tools instead spend host CPU encoding JPEG frames.

The agent now uses that silicon when it is there. The Linux helper picks among vah264enc (VA-API), x264enc (software), or JPEG fallback based on what the host actually offers. The Windows helper streams WebP today with H.264 via Media Foundation queued as a follow-up; the wire format and browser viewer already support H.264 on either OS, so the Windows encoder switch is a backend change with no impact on the operator's experience. The browser decodes via the WebCodecs VideoDecoder API on the H.264 path and a SkiaSharp-decoded canvas paint on the WebP path; either way the viewer treats the result identically.

The practical result for the host is bounded capture and encode load. On a workstation with VA-API, encoding cost is moved off the CPU to dedicated silicon. On a workstation without hardware encoding, software encoding runs in the helper process under the agent's cgroup limits (memory and CPU ceilings on Linux, job-object limits on Windows). The host workload never gets crowded out by the encoder. On a host where neither is acceptable, JPEG remains available as a fallback.

For the operator, the practical result is a usable session at typical desk-work activity even on a slow link, because H.264 with hardware acceleration is the same compression technology that lets video conferencing work at 1080p over residential connections.

Browser support and the JPEG fallback

WebCodecs is a recent browser API. It works on Chromium-based browsers (Chrome, Edge, Brave) at version 94 or later, on Firefox 130 or later (where it is now stable), and on Safari 16 or later. For browsers without WebCodecs support, the agent automatically falls back to JPEG on session start. The viewer's render path is the same canvas in either case, and the operator does not see a different UI.

The agent announces a codec capability on session establishment, and the viewer subscribes to the type=6 codec-announce message before the first frame arrives. If the announce says JPEG, the viewer never engages WebCodecs; if it says H.264, the viewer configures a VideoDecoder with the SPS/PPS from the announce and starts decoding type=1 NALU frames as they arrive.

Windows secure desktop: UAC visibility

The single hardest screen-capture problem in Windows remote-control software is the Windows secure desktop. When a UAC prompt fires (or a user presses Ctrl+Alt+Del, or the workstation locks), Windows switches the active desktop to Winlogon, which renders in a separate session with kernel-level isolation from the user's interactive desktop. Most RMM tools blank the viewer for the duration: capture is bound to the user's desktop, the user's desktop is no longer active, no frames flow. The technician sees a frozen screen and has to call someone at the keyboard.

The ET Ducky helper handles this with a second capture path. The helper polls the input desktop each iteration. When it detects Winlogon, the capture thread switches its desktop binding to Winlogon via SetThreadDesktop, captures the screen via GDI BitBlt (DXGI Desktop Duplication does not bind cleanly to the secure desktop), overlays the cursor with GetCursorInfo+DrawIconEx, and ships those frames the same way it would ship user-desktop frames. When the secure desktop dismisses, the helper switches the thread binding back, DXGI re-initializes on the user's desktop, and the operator sees a clean transition with no missing frames.

Input flows the same way. The helper's input injection thread re-attaches to the current input desktop before each batch of SendInput calls, so clicks and keystrokes from the dashboard land on the UAC dialog when UAC is active and on the user's desktop otherwise. The technician can click "Yes" on a UAC prompt from a thousand miles away without having to coordinate with anyone at the target machine.

The mechanism that makes this work is the helper's uiAccess=true manifest. Windows enforces three preconditions before honoring that flag: the binary must be Authenticode-signed by a CA-trusted publisher, it must reside in a Windows-defined secure location like Program Files, and the launching process must hold SeTcbPrivilege. The agent service ships an explicit privilege whitelist via sc privs, the helper is signed by ET Ducky LLC's EV code-signing certificate, and the helper is cached under C:\Program Files\ETDucky\Agent\RdpHelper\<version>\. With all three in place, the helper can capture and inject across the User Interface Privilege Isolation boundary; without all three, Windows refuses to honor the flag and the helper falls back to user-desktop-only behavior.

That precondition chain is also the security property: the helper's UIPI bypass is narrow. It does not grant the helper additional file system, registry, network, or token privileges. The helper runs at the interactive user's integrity level. An attacker who replaced the binary in the cache would fail Authenticode validation at WinVerifyTrust time, and Windows would refuse to honor the manifest. The signed-helper distribution path is documented in more detail on the security page.

No VPN, no firewall rules

The agent makes outbound HTTPS connections to etducky.com for everything: enrollment, heartbeat, event upload, live-session AI, and remote desktop. The remote-desktop session is a WebSocket upgraded from that same outbound connection. The agent does not open any inbound ports, does not require port-forwarding, and does not need a VPN tunnel.

For the operator, the consequence is that connecting to a host behind a corporate NAT, a residential router, or a cloud security group works without firewall changes. For the host, the consequence is that the agent's network footprint is one outbound TLS connection to one well-known host.

For the SOC reviewing the agent's deployment, that property is what makes the remote-desktop feature reviewable. The agent does not punch holes in the network. It rides on top of the same connection it was already using to ship telemetry.

Resource bounds matter, especially during sessions

Remote desktop is the agent's most expensive operation. Capturing every frame, encoding it, and shipping it is more work than the heartbeat or the event-batch path. The cgroup limits on the Linux agent and the equivalent job-object limits on Windows are sized so that this is fine: a session can run for an hour at full quality without crowding out the workloads on the host.

The encoding side is bounded by the codec choice. Hardware encoding has near-zero CPU cost. Software encoding under x264enc tunes itself to the available CPU budget; if the cgroup is at 50% of one core, x264 will produce lower-bitrate output rather than spike past the limit. The frame-pump side is bounded by the helper process's own buffers; it drops frames before backpressuring the kernel.

What this means in practice: the technician can leave a remote desktop session open for as long as the diagnosis takes, and the agent will not start using more host resources than the operator agreed to when they set the limits. The "I left a remote session open and the host got slow" failure mode is engineered out.

Where this fits in the broader product

Remote desktop in ET Ducky is one of three live-session interaction modes alongside the AI question/answer flow and the shell + file-transfer pane. The three are designed to share a session: open a remote desktop, ask the AI a question about what the user is seeing, run a shell command in the session's other tab, and the audit trail captures all three under one session id with operator attribution.

That is the part that is hard to demo in ten seconds and that matters during the second hour of the diagnosis. The remote desktop is not a separate product bolted on. It is one tab in the live session, sharing the same authentication, the same audit log, and the same operator-elevation flow as everything else the agent does.

Try the remote desktop yourself

Deploy an agent in minutes. Browser-based remote control on Windows or Linux.

Get Started Free