Towards Physical, Imperceptible Adversarial Attacks via Adversarial Programs
Adversarial examples were originally defined as imperceptible perturbations which cause a deep neural network to misclassify. However, the majority of imperceptible perturbation attacks require to perturb a large number of pixels across the image and are thus hard to execute in the physical world. Existing physical attacks rely on physical objects, such as patches/stickers or 3D-printed objects. Producing adversarial patches is arguably easier than 3D-printing but these attacks incur highly visible perturbations. This raises the question: is it possible to generate adversarial examples with imperceptible patches? In this work, we consider adversarial multi-patch attacks, where the goal is to compute a targeted attack consisting of up to K patches with minimal L2 distortion. Each patch is associated with dimensions, position, and perturbation parameters. We leverage ideas from program synthesis and numerical optimization to search in this large, discrete space and obtain attacks that are competitive with the C&W attack but have at least 3x and up to 10x fewer perturbed pixels. We evaluate our approach on MNIST, Fashion-MNIST, CIFAR-10, and ImageNet and obtain success rate of at least 92% and up to 100% with at most ten patches. For MNIST, Fashion-MNIST, and CIFAR-10, the average L2 distortion is greater than the average L2 distortion of the C and W attack by up to 1.2.
*MSc student under the supervision of Prof. Dana Drachsler Cohen.