Confidential LLM Inference
Future of LLM inference is in the clouds. Confidential LLM inference allows anyone, could be an AI agent or human being or even an edge device to use a remote LLM without leaking any data.
The libcofhe
provides a set of APIs for performing encrypted inference using these primitives and also builds on these primitives to provide higher-level APIs for common machine learning tasks. There are also tools for converting existing machine learning models into encrypted models. Also python bindings are provided for the libcofhe
library.
First we need to preprocess the model . This is done by converting the model to a format that can be used by CoFHE. This process involves encrypting the model parameters, saving what all layers are encrypted and other metadata required for encrypted inference. The following code snippet demonstrates how to preprocess a model:
The following code snippet demonstrates how to perform encrypted inference using CoFHE:
In this example, we first initialize the CoFHE library using a configuration file. For more details about the syntax see the API reference. We then load a pre-trained model that has been encrypted using CoFHE. We then encrypt the input data using the cofhe.encrypt
function. We then perform inference on the encrypted data using the model. Finally, we decrypt the output using the cofhe.decrypt
function and print the result.
Here are some samples of the encrypted inference in action for gpt-2 model(not from this exact code, uses cpp cofhe lib directly):
Compute Node
Last updated