Name: Edge-Cloud Integration: Making OpenClaw More Interesting and Secure
Start: 2026-04-17T00:00:00.000Z
Location: 2026 China Generative AI Conference (Beijing)

This is the talk I gave at the 2026 China Generative AI Conference (Beijing), OpenClaw Technical Workshop. Officially I was there as VP of Qujing Technology, but I'd rather speak from the perspective of someone who helps people install OpenClaw every day.

Why This Talk

After remotely installing OpenClaw for dozens of people, one thing became increasingly clear to me: the OpenClaw agent form factor thrusts the "hardware carrier" question back into the spotlight. A computer used to just be a computer; now you have to decide upfront which machine the agent runs on, whether it stays on 24/7 or requires human tending, whether data stays local or goes to the cloud, and who takes responsibility when something goes wrong.

None of these questions can be answered cleanly in a purely edge or purely cloud paradigm. Edge is cheap, private, and can compute asynchronously, but its intelligence caps out early. The cloud is brilliantly smart, but privacy, cost, and legal liability all create blockers. In the near term, only the edge-cloud integration path works. This talk is about that path.

Three Arguments

Edge devices should stop chasing large models. Compute, bandwidth, and memory supply chains form a triple mountain pressing down; what the box should pursue isn't "can run 70B," but rather "stays awake, stays cheap, stays quiet."
Multimodal is what actually deserves to run on the edge. Embedding, ASR, TTS, OCR, VLM—these small models are used daily, and they share two traits: you don't want them leaked externally, and they aren't time-sensitive. Yet the cloud charges per second at absurd rates.
AIMA turns edge-cloud integration into a viable path. From installation to connection to interaction, it connects everything in one go. You tell the cloud, "Install OpenClaw on this machine, connect LLM, connect Feishu," and the edge and cloud move together immediately.

Why "More Secure"

I spent considerable time on security in this talk, because it's the most easily overlooked advantage of edge-cloud integration.

Architecturally, OpenClaw defaults to running on localhost, with backend tools on 127.0.0.1. Only one Gateway goes out via an IM long-connection; the outside world cannot touch your machine.

Legally, the edge is a "sale," the cloud is a "rental." The cigarette seller takes money, delivers goods, and the transaction ends; the opium den operator bears accomplice liability. Once an agent can make its own decisions, the party with clear ownership is the safest.

Talk Transcript

Opening: The Agent Era from an Infrastructure Perspective

Hello everyone, I'm Guan Jiawei from Qujing Technology. Today's topic is edge-cloud integration: making OpenClaw more interesting and more secure. Qujing Technology works on infrastructure, primarily focusing on model inference and usage costs, so I'd like to discuss this from a different angle: as agents like OpenClaw begin to proliferate, what new changes will appear in hardware?

By the way, this time the "slides" are actually a webpage, not a PPT. My personal view is a bit more radical—by year-end almost no one will be using PPT software. Making an agent draw slides is incredibly painful; at its core, PPT is just web code wrapped in a "no-code tool" shell. Conversely, having an agent build a webpage directly is fast and produces something prettier than I could make myself. The reflection behind this is: once agents take off, all infrastructure will need to be rebuilt, and the transformation will be massive.

Reflections from Helping People Install OpenClaw

Beyond my corporate identity, since OpenClaw went viral, I've personally helped install over 100 "little crawfish" for others. The installation process revealed some very interesting problems. The most common one: shortly after installation, someone asks me, "Why isn't my crawfish responding anymore?" I ask them to check if the machine is on, and they reply, "Oh, I accidentally shut it down."

This is very common. In the PC era, the habit was to turn things off when not in use, to shut down when away. But in the agent era, hardware needs to stay on 24/7 and be on-call at any moment. When requirements change, hardware form factors naturally change too. After OpenClaw went viral, Mac minis selling out follows this logic—the little box everyone used to look down on suddenly turned out to be brilliantly designed: small footprint, quiet, low power consumption over long runs, and unlike Windows, it doesn't get slower over time.

The Challenge of Running Large Models on the Edge

First argument: currently, there's basically no chance of fully loading a large model onto an edge device. There will be in the future, but not right now.

I tried running a 30B+ model on a 128GB unified-memory device to connect to OpenClaw for others to use. The feedback after testing was "it feels a bit dumb," so I immediately switched to Kimi—they're completely different species, not even in the same league. One frequently interrupts tasks and goes off-track; the other clearly follows your instructions. So the core driver for intelligent agents remains on the cloud large-model side.

So what should the edge look like? At minimum, a few characteristics: it should be small and not take up space. Recently I've been seriously considering whether to install a small server at home—I never had this need before; agents forced it out. Then there's stability, which needs to be an order of magnitude higher than before. I used to be a heavy Windows user; at first I couldn't even accept when the company gave me a Mac. Now I'm thoroughly a Mac user and a vocal Windows critic—installing OpenClaw on Windows is an absolutely terrible experience, with frequent freezes and hangs, the machine overheating like crazy, and piles of redundant features and "security designs" that serve agents not at all.

Good Edge Scenarios: Multimodal and Privacy

We believe the most suitable scenario for the edge is multimodal. Two reasons: cost, and privacy.

On cost, calling an Omni model in the cloud to parse video is priced so outrageously you don't even want to look—the server itself isn't expensive, but traffic is extremely costly, and audio/video files are inherently large, so moving them in and out is naturally high-cost.

On privacy, when text is sent to the cloud you can roughly do the mental math of "what did it send," but you can't "take one look" and judge whether a one-hour video contains sensitive information. Home camera data, voice data—this risk jumps exponentially with multimodal. The Xiao Ai assistants people use are actually quite scary: you're chatting about a sensitive topic and it suddenly pipes up "I'm here"—it must have been listening the entire time and sending audio to the cloud.

So many OpenClaw users in the U.S. are now trying to connect their home smart devices into OpenClaw for local processing. This presents a new hardware scenario: run multimodal locally, put the brain in the cloud, and let each side do what it's good at.

AIMA: Letting AI Manage AI Infrastructure

We built a piece of software called AIMA, with the philosophy "use AI to manage AI infrastructure."

The startup experience is a bit like Ollama: once plugged in, it tells you what hardware this is and what models it can run, and you can one-click pull models from the web to local or spin up local models. Ollama runs language models smoothly, but running multimodal is actually quite tedious—for example, if you want OpenClaw to connect to a cloned voice to chat with you (OpenClaw now has "soul" files that give agents their own personalities, and many people feel the only missing piece is a voice), running a TTS model locally is quite challenging. What AIMA wants to do is make multimodal and multi-inference-engine setups as "install-and-use" as Ollama.

The software natively supports OpenClaw: with one click in AIMA, you can push a model running on the machine to OpenClaw for use, and conversely OpenClaw can call it as well.

Lingji Cloud: The Cloud Agent Team That Has Your Back

Edge alone isn't enough. When I help people install OpenClaw, a huge amount of time is spent solving "why isn't my crawfish working." One user took it even further—after installing his crawfish, he had the crawfish change its own model, which is like having an agent perform brain surgery on itself. In the middle it blew up, OpenClaw crashed directly, and he didn't know how to change the API back.

At times like these you need a "doctor" stronger than the stuck local agent. With one click in AIMA we can connect to emmaservice (Lingji Cloud): behind it is a distributed, multi-model cloud agent team that can one-click install the latest OpenClaw for you, connect language models, connect Feishu. It's not a script; it's a true cloud intelligent agent. It doesn't come unless you call it; when you need it, it comes in to diagnose, analyze, and execute—no chatting, no suggestions, just gets it done.

Hardware Demand Shifts Rapidly with Technology Cycles

In our years of doing infrastructure, we've watched hardware demands shift repeatedly with technology cycles. This time last year, what sold out was the H20—DeepSeek was so hot that hardware targeted at that class of models was completely sold out. Today, as we stand on stage talking about OpenClaw, what's sold out has become the Mac mini. No one thought the AI industry would make a small computer that seems to have nothing to do with AI go viral—you now have to wait two months to buy a Mac mini from the official site, and it's not a money issue.

This means every time a new trend arrives, existing products and designs must be rethought.

Edge-Cloud Division: Let the Edge Do What It Does Best

Edge and cloud should each play to their strengths. Many products today are thinking "use edge to replace cloud," but in scenarios like agents and web coding, a small model is simply hard to use—so hard that it kills your desire to use it. That kind of work should be done in the cloud, where intelligence is stronger and relative cost is lower.

The edge should do what it's inherently good at. For example, installing a Windows laptop to run an agent is asking for trouble; hardware resources sit idle every day, so you might as well put that idle compute to work on multimodal. Imagine spending a few tens of thousands on a small home server, dumping all camera data into it, latency-insensitive, and waking up to a report from yesterday—was the child's interaction safe? Any dangerous behavior? Should the elderly communication be improved? Only at the final report generation stage do you call the cloud language model for polishing. This "edge for perception, cloud for brain" division keeps both cost and privacy acceptable.

Security: Architecture Layer

A counterintuitive argument: OpenClaw running locally is safer than running in the cloud.

Many cloud vendors will tell you "only the cloud is safe; we're professionals," but OpenClaw's design actually only exposes one port to the outside—interaction with your IM. Everything else runs locally. If you don't proactively initiate strange behavior, OpenClaw is static; the only entry point from the outside is that IM channel. As long as that IM link isn't hijacked, the machine as a whole is quite secure. Theoretically safer than the cloud.

Security: Legal Layer

Second counterintuitive perspective: legally, personally owning the hardware to run your own agent is far more sound than renting a cloud server to run it.

For example. The agent running on a device you purchased, and the assets it produces, are yours; whatever it does, you bear responsibility, and ownership is clear. But what about renting in the cloud? It's a bit like I opened an opium den and you come to smoke. Agents today already have strong autonomy; for example, the latest models can already conduct hacking attacks and defense.

Suppose you rent a server from some cloud vendor to run OpenClaw, and throw it a vague instruction like "help me make money"—back when AutoGPT was hot, the first instruction was always "how do I become a millionaire." What if the agent goes haywire and chooses the "become a hacker and extort" path to make money, breaks into some security-weak place and pulls off a heist, committing a crime. How is liability divided? Say it's all on you? You only told it to make money. The cloud vendor's server becomes a criminal tool or even a criminal venue. This is a brand-new legal problem that never existed before.

What will cloud vendors do once they discover this? Most likely raise costs and cut features. This is why many people feel that cloud-based solutions "just can't do certain things"—it's not a technical problem, it's a liability problem.

Agents Need a Carrier: The Hardware Seat for Digital Employees

If we want agents like OpenClaw to move from toy to tool, edge-cloud integration is almost the only path. Pure-cloud solutions are hard-pressed to cleanly solve three things: intelligence, liability, and multimodal costs.

Going further, there's a concrete scenario. In enterprises, if you just spin up a bunch of Docker containers to run agents, you can't replace a digital employee—all previous infrastructure was designed with "for humans" as the premise. An agent doesn't even have an independent IP; entering the real internet, it's easily recognized as a crawler and blocked directly. Every digital employee needs a real hardware carrier to possibly function in real society.

You can imagine the future: when an employee onboards, the company issues a laptop; when an agent onboards, the company issues an agent box—hardware designed specifically for agents, running in a corner 24/7 without sleep, only requiring human intervention when it malfunctions. Many hardware manufacturers are already working on this; welcome to join us in practicing it.

Closing: Practice Beats Theory

This webpage (also my personal homepage, guanjiawei.ai) is itself an example of the edge-cloud integration era. The cost of developing a personal website has dropped to a level I can hardly imagine—an afternoon's work produces something far more polished than the "personal webpages" I used to imagine.

To build this webpage and this slide deck, I used a skill written by someone else (not me), pulled from GitHub, polished with Claude Code plus Gemini image generation. The efficiency of the entire process is unimaginable. If someone hasn't started using agents yet, they don't even know that knowledge transfer can be this fast—something that takes hours of verbal explanation and hand-holding can be compiled into a skill, given as one link, and the other party can replicate it in five minutes.

So although today's theme is edge-cloud integration, what I really want to say is: just start using agents; the value far exceeds any theoretical analysis. I also resonated deeply with Mr. Huang's presentation today—all conclusions come from team practice, and only after practicing do you have the right to speak. That's my brief sharing, thank you all.

On-Site Usage Instructions

Navigation: Arrow keys / Space / touch swipe / right-side dot navigation
Fullscreen: Recommended to open the raw HTML directly (the "Open Fullscreen" button in the upper-right corner of the page)

The slides for this talk—structure, copy, visuals, and illustrations—were all completed in one session using Claude Code + Gemini. If you also want to use an agent to make a presentation deck, just ask me.