Exposing Local LLMs to the Web: Running LM Studio as a Production API

If you're using LM Studio, you already know how easy it is to download and run models like Llama 3, Mistral, or Qwen on your local hardware. It's fantastic for local chatting, but what if you want to use that model to power a web app, a Discord bot, or a live website chatbot?

By default, LM Studio's built-in Local Server only listens on localhost (127.0.0.1) for security reasons. Exposing it directly to the internet is dangerous. Instead, the correct way to do this is by using a reverse proxy.

Here is how I securely expose LM Studio to the web using Caddy on Windows, which handles HTTPS and routing automatically. This is the exact setup powering the chatbot on Rocky's Labs.

1. Start the LM Studio Local Server

First, open LM Studio, load your preferred model (I recommend a fast, quantized 2B-7B parameter model for web tasks, like Qwen 2.5), and go to the Local Server tab.

Ensure the port is set to 1234 (or note down your custom port).
Turn on CORS if your web frontend will be making requests directly to it from a browser.
Click Start Server.

Test that it works locally by opening a command prompt and running:

curl http://localhost:1234/v1/models

You should see a JSON response listing the loaded model.

2. Install and Configure Caddy

Caddy is an incredibly simple reverse proxy. Download the Windows executable from their website, and place caddy.exe in a permanent folder (e.g., C:\caddy).

Create a file named Caddyfile (no extension) in that same folder. Here is the configuration to map a public URL path to your local LM Studio instance:

# C:\caddy\Caddyfile

yourdomain.com {
    # Serve your frontend files (if any)
    root * C:\path\to\your\website
    file_server

    # Reverse proxy /chat-api/ to LM Studio
    handle_path /chat-api/* {
        reverse_proxy localhost:1234
    }
}

Note: handle_path automatically strips the /chat-api/ prefix before sending the request to localhost:1234. So a request to https://yourdomain.com/chat-api/v1/chat/completions becomes http://localhost:1234/v1/chat/completions on your machine.

3. Run Caddy

Open an Administrator PowerShell, navigate to your Caddy folder, and run:

caddy run

Caddy will immediately contact Let's Encrypt, provision an SSL certificate for your domain, and start proxying traffic. Ensure ports 80 and 443 are forwarded on your router to your machine's IP address.

4. The Frontend Request

Now, your frontend JavaScript can make standard OpenAI-compatible fetch requests to your own domain. You don't need to deal with API keys or CORS preflight issues because the API is hosted on the same origin as your website.

const response = await fetch('/chat-api/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: "local-model", // LM Studio ignores this and uses the loaded model
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Hello!" }
    ],
    temperature: 0.7
  })
});

5. Security Considerations

Because you are exposing an endpoint to the open web, anyone can hit it if they find the URL. A few things you should consider:

Block indexing: Add Disallow: /chat-api/ to your robots.txt so search engines don't index your endpoint.
Rate limiting: If you get malicious traffic, you can add rate limiting rules directly inside the Caddyfile.
Basic Auth: You can configure Caddy to require an Authorization header or basic auth before proxying the request to LM Studio if this is a private tool.

By using Caddy, you get a production-ready API powered entirely by your local GPU, with zero cloud hosting costs.

— Rakesh Ganesan
Rocky's Labs · 27 May 2026