Exposing Local LLMs to the Web: LM Studio + Caddy
If you're using LM Studio, you already know how easy it is to download and run models like Llama 3, Mistral, or Qwen on your local hardware. It's fantastic for local chatting, but what if you want to use that model to power a web app, a Discord bot, or a live website chatbot?
By default, LM Studio's built-in Local Server only listens on localhost (127.0.0.1) for security reasons. Exposing it directly to the internet is dangerous. Instead, the correct way to do this is by using a reverse proxy.
Here is how I securely expose LM Studio to the web using Caddy on Windows, which handles HTTPS and routing automatically. This is the exact setup powering the chatbot on Rocky's Labs.
1. Start the LM Studio Local Server
First, open LM Studio, load your preferred model (I recommend a fast, quantized 2B-7B parameter model for web tasks, like Qwen 2.5), and go to the Local Server tab.
- Ensure the port is set to 1234 (or note down your custom port).
- Turn on CORS if your web frontend will be making requests directly to it from a browser.
- Click Start Server.
Test that it works locally by opening a command prompt and running:
curl http://localhost:1234/v1/models
You should see a JSON response listing the loaded model.
2. Install and Configure Caddy
Caddy is an incredibly simple reverse proxy. Download the Windows executable from their website, and place caddy.exe in a permanent folder (e.g., C:\caddy).
Create a file named Caddyfile (no extension) in that same folder. Here is the configuration to map a public URL path to your local LM Studio instance:
# C:\caddy\Caddyfile
yourdomain.com {
# Serve your frontend files (if any)
root * C:\path\to\your\website
file_server
# Reverse proxy /chat-api/ to LM Studio
handle_path /chat-api/* {
reverse_proxy localhost:1234
}
}
Note: handle_path automatically strips the /chat-api/ prefix before sending the request to localhost:1234. So a request to https://yourdomain.com/chat-api/v1/chat/completions becomes http://localhost:1234/v1/chat/completions on your machine.
3. Run Caddy
Open an Administrator PowerShell, navigate to your Caddy folder, and run:
caddy run
Caddy will immediately contact Let's Encrypt, provision an SSL certificate for your domain, and start proxying traffic. Ensure ports 80 and 443 are forwarded on your router to your machine's IP address.
4. The Frontend Request
Now, your frontend JavaScript can make standard OpenAI-compatible fetch requests to your own domain. You don't need to deal with API keys or CORS preflight issues because the API is hosted on the same origin as your website.
const response = await fetch('/chat-api/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: "local-model", // LM Studio ignores this and uses the loaded model
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "Hello!" }
],
temperature: 0.7
})
});
5. Security Considerations
Because you are exposing an endpoint to the open web, anyone can hit it if they find the URL. A few things you should consider:
- Block indexing: Add
Disallow: /chat-api/to yourrobots.txtso search engines don't index your endpoint. - Rate limiting: If you get malicious traffic, you can add rate limiting rules directly inside the Caddyfile.
- Basic Auth: You can configure Caddy to require an Authorization header or basic auth before proxying the request to LM Studio if this is a private tool.
By using Caddy, you get a production-ready API powered entirely by your local GPU, with zero cloud hosting costs.
— Rakesh Ganesan
Rocky's Labs · 27 May 2026