# Confidential LLM Inference

Future of LLM inference is in the clouds. Confidential LLM inference allows anyone, could be an AI agent or human being or even an edge device to use a remote LLM without leaking any data.&#x20;

The `libcofhe` provides a set of APIs for performing encrypted inference using these primitives and also builds on these primitives to provide higher-level APIs for common machine learning tasks. There are also tools for converting existing machine learning models into encrypted models. Also python bindings are provided for the `libcofhe` library.

First we need to preprocess the model . This is done by converting the model to a format that can be used by CoFHE. This process involves encrypting the model parameters, saving what all layers are encrypted and other metadata required for encrypted inference. The following code snippet demonstrates how to preprocess a model:

```python
import cofhe

cofhe.init("path/to/config.json")

# huggingface model
config = {
    "model": model,
    "layers": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
}
cofhefy_model = cofhe.cofhefy(config)

# Save the cofhe-fy model
cofhe.save_model(cofhefy_model, "/path/to/save")
```

The following code snippet demonstrates how to perform encrypted inference using CoFHE:

```python
import cofhe

# Initialize the CoFHE library using a configuration file
# The configuration file contain the parameters required for network initialization
# For example, the parameters required for homomorphic encryption, secure multi-party computation, other node details etc.
cofhe.init("path/to/config.json")
# Load a pre-trained model, this model should be cofhe-fy already
model = cofhe.load_model("/path/to/model")
prompt = input("Enter a prompt: ")
# Encrypt the input data
encrypted_input = cofhe.encrypt(prompt)
# Perform inference on the encrypted data
encrypted_output = model(encrypted_input)
# Decrypt the output
output = cofhe.decrypt(encrypted_output)
print(output)
```

In this example, we first initialize the CoFHE library using a configuration file. For more details about the syntax see the API reference. We then load a pre-trained model that has been encrypted using CoFHE. We then encrypt the input data using the `cofhe.encrypt` function. We then perform inference on the encrypted data using the model. Finally, we decrypt the output using the `cofhe.decrypt` function and print the result.

Here are some samples of the encrypted inference in action for gpt-2 model(not from this exact code, uses cpp cofhe lib directly):

<figure><img src="/files/j4lP0zvVUGt0msyaKuSK" alt=""><figcaption><p>User Machine</p></figcaption></figure>

<figure><img src="/files/9CbeftF0bTN8BQ91s9sw" alt=""><figcaption></figcaption></figure>

Compute Node


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://openvector.gitbook.io/docs/use-cases/editor.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
