Confidential LLM Inference

Future of LLM inference is in the clouds. Confidential LLM inference allows anyone, could be an AI agent or human being or even an edge device to use a remote LLM without leaking any data.

The libcofhe provides a set of APIs for performing encrypted inference using these primitives and also builds on these primitives to provide higher-level APIs for common machine learning tasks. There are also tools for converting existing machine learning models into encrypted models. Also python bindings are provided for the libcofhe library.

First we need to preprocess the model . This is done by converting the model to a format that can be used by CoFHE. This process involves encrypting the model parameters, saving what all layers are encrypted and other metadata required for encrypted inference. The following code snippet demonstrates how to preprocess a model:

import cofhe

cofhe.init("path/to/config.json")

# huggingface model
config = {
    "model": model,
    "layers": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
}
cofhefy_model = cofhe.cofhefy(config)

# Save the cofhe-fy model
cofhe.save_model(cofhefy_model, "/path/to/save")

The following code snippet demonstrates how to perform encrypted inference using CoFHE:

import cofhe

# Initialize the CoFHE library using a configuration file
# The configuration file contain the parameters required for network initialization
# For example, the parameters required for homomorphic encryption, secure multi-party computation, other node details etc.
cofhe.init("path/to/config.json")
# Load a pre-trained model, this model should be cofhe-fy already
model = cofhe.load_model("/path/to/model")
prompt = input("Enter a prompt: ")
# Encrypt the input data
encrypted_input = cofhe.encrypt(prompt)
# Perform inference on the encrypted data
encrypted_output = model(encrypted_input)
# Decrypt the output
output = cofhe.decrypt(encrypted_output)
print(output)

In this example, we first initialize the CoFHE library using a configuration file. For more details about the syntax see the API reference. We then load a pre-trained model that has been encrypted using CoFHE. We then encrypt the input data using the cofhe.encrypt function. We then perform inference on the encrypted data using the model. Finally, we decrypt the output using the cofhe.decrypt function and print the result.

Here are some samples of the encrypted inference in action for gpt-2 model(not from this exact code, uses cpp cofhe lib directly):

Compute Node

Last updated