Theoretical Limits of Provable Security Against Model Extraction by Efficient Observational Defenses

Theoretical Limits of Provable Security Against Model Extraction by Efficient Observational Defenses
Author	Ari Karchmer
Abstract	Can we hope to provide provable security against model extraction attacks? As a step towards a theoretical study of this question, we unify and abstract a wide range of “observational” model extraction defenses (OMEDs) - roughly, those that attempt to detect model extraction by analyzing the distribution over the adversary s queries. To accompany the abstract OMED, we define the notion of complete OMEDs - when benign clients can freely interact with the model - and sound OMEDs - when adversarial clients are caught and prevented from reverse engineering the model. Our formalism facilitates a simple argument for obtaining provable security against model extraction by complete and sound OMEDs, using (average-case) hardness assumptions for PAC-learning, in a way that abstracts current techniques in the prior literature. The main result of this work establishes a partial computational incompleteness theorem for the OMED: any efficient OMED for a machine learning model computable by a polynomial size decision tree that satisfies a basic form of completeness cannot satisfy soundness, unless the subexponential Learning Parity with Noise (LPN) assumption does not hold. To prove the incompleteness theorem, we introduce a class of model extraction attacks called natural Covert Learning attacks based on a connection to the Covert Learning model of Canetti and Karchmer (TCC 21), and show that such attacks circumvent any defense within our abstract mechanism in a black-box, nonadaptive way. As a further technical contribution, we extend the Covert Learning algorithm of Canetti and Karchmer to work over any “concise” product distribution (albeit for juntas of a logarithmic number of variables rather than polynomial size decision trees), by showing that the technique of learning with a distributional inverter of Binnendyk et al. (ALT 22) remains viable in the Covert Learning setting.
Year of Publication	2023
Date Published	feb
URL	https://ieeexplore.ieee.org/document/10136174
DOI	10.1109/SaTML54575.2023.00046
Google Scholar \| BibTeX \| DOI