Other Approaches
Cryptographic techniques like MPC and FHE have previously been used to enable privacy-preserving LLMs. However, both FHE and MPC present significant technical and practical challenges.
Problems with FHE
Results for privacy-preserving inference for LLMs show that the performance of TenSeal(CKKS) drops by 5 orders of magnitude compared to Pytorch(PlainText), while the ConcreteML(tFHE) library drops by 6 to 7 orders of magnitude. This shows that the computational performance of homomorphic encryption is weak, and it is difficult to meet the demand of practical LLM inference. Within transformer models, linear and non-linear operations result in millions of sequential homomorphic computations. To process such a number of sequential operations bootstrapping becomes a necessity. But bootstrapping is the most time-consuming part of the entire process, the SOTA for FHE-based inference, consumes 534 seconds for bootstrapping, occupying 62.3% of the total runtime in their CPU implementation.
Problems with MPC based methods
Existing MPC-based secure solutions suffer through high compute and communication overheads, particularly for transformers. SOTA MPC uses high-end GPUs as compute nodes and total data communication, including the pre-processing phase between two nodes, comes out to be more than 18 GB to generate one output token on Llama 2 13B. To use the above setup, end users can generate shares and send them directly to the individual computing nodes but in that scenario, users must generate the shares on a secure, private platform. Generating shares on a public platform, like smart contracts, destroys the privacy needs of the input prompts. On the other hand incorporating inputs from multiple share providers at different stages make the computations very complex. While with FHE, users can encrypt the data, mix them and keep them anywhere for anytime privacy-preserving inference or training.
Problems with TEE
While Trusted Execution Environments (TEEs) offer security benefits, they come with a key limitation: they inherently rely on a centralized, trusted party. From the end-user perspective, this introduces a fundamental trust issue. For example, with AWS Nitro Enclaves, any AWS administrator could potentially modify the enclave’s code. Remote attestation, a common mechanism used to verify the integrity of TEEs, is inherently weak in this context. It assumes that the "trusted" entity—which could be the cloud service provider, hardware vendor, or device manufacturer—can be fully trusted. This is a significant assumption, as the TEE’s trustworthiness is contingent on the actions of a third party, rather than on the integrity of the system itself.
Last updated