Theoretical Limits of Provable Security Against Model Extraction by Efficient Observational Defenses
Author
Abstract

Can we hope to provide provable security against model extraction attacks? As a step towards a theoretical study of this question, we unify and abstract a wide range of “observational” model extraction defenses (OMEDs) - roughly, those that attempt to detect model extraction by analyzing the distribution over the adversary s queries. To accompany the abstract OMED, we define the notion of complete OMEDs - when benign clients can freely interact with the model - and sound OMEDs - when adversarial clients are caught and prevented from reverse engineering the model. Our formalism facilitates a simple argument for obtaining provable security against model extraction by complete and sound OMEDs, using (average-case) hardness assumptions for PAC-learning, in a way that abstracts current techniques in the prior literature. The main result of this work establishes a partial computational incompleteness theorem for the OMED: any efficient OMED for a machine learning model computable by a polynomial size decision tree that satisfies a basic form of completeness cannot satisfy soundness, unless the subexponential Learning Parity with Noise (LPN) assumption does not hold. To prove the incompleteness theorem, we introduce a class of model extraction attacks called natural Covert Learning attacks based on a connection to the Covert Learning model of Canetti and Karchmer (TCC 21), and show that such attacks circumvent any defense within our abstract mechanism in a black-box, nonadaptive way. As a further technical contribution, we extend the Covert Learning algorithm of Canetti and Karchmer to work over any “concise” product distribution (albeit for juntas of a logarithmic number of variables rather than polynomial size decision trees), by showing that the technique of learning with a distributional inverter of Binnendyk et al. (ALT 22) remains viable in the Covert Learning setting.

Year of Publication
2023
Date Published
feb
URL
https://ieeexplore.ieee.org/document/10136174
DOI
10.1109/SaTML54575.2023.00046
Google Scholar | BibTeX | DOI