Research Team Status

  1. Names of researchers and position 
    1. Michael W. Mahoney (Research Scientist)
    2. N. Benjamin Erichson (Research Scientist)
    3. Serge Egelman  (Research Scientist)
    4. John Cava (PhD student)
    5. Zhipeng Wei (incoming Postdoc)

       
  • Any new collaborations with other universities/researchers?
    • Ongoing collaboration with Dartmouth University.

Project Goals

  • What is the current project goal?
    This focused on the following two goals: (P1) Improving robustness by leveraging Diffusion Models for data augmentation (lead by N. Benjamin Erichson);  (P2) developing metrics for characterizing the interplay between sharpness and diversity within deep ensembles (lead by Michael W. Mahoney).
     
  • How does the current goal factor into the long-term goal of the project?
    • Both of these goals are aligned with our long-term goals of improving model robustness and developing AI safety metrics. 

Accomplishments

  • P1. 
    • Background: Advancements in augmentations have greatly improved the robustness of computer vision models in situations that result in corrupted input or unseen out of distribution scenarios. This is in conjunction with the deployment of computer visions in the real world such as automated vehicles. However, little attention has been made in addressing issues revolving around how computer vision models make bigger mistakes than others - namely "mistake severity". With mistake severity, a hierarchical tree is constructed that defines the distance between certain objects such as a person, dog, and tree. This then gives the ability to measure how a model misclassifies objects, which is very important to assess for models deployed in high-cost situations such as automated vehicles or drones, where misclassifying a person with a dog is not as severe as misclassifying a person with a tree. Currently, mistake severity is used to assess models that incorporate hierarchical distance as a loss function to train models that minimize mistake severity. However, the use of the hierarchical distance could not only be used as a metric, but also as a way to augment data. 
    • Proposed approach: We investigated the diffusion models as a tool to generate synthetic data that improves robustness of models. They hypothesis was that that it will be beneficial to use hierarchical distance to precisely generate synthetic data that minimizes mean mistake severity by focusing on classes of objects that are misclassified severely. This is contrast of just using arbitrary data augmentations. 
    • Outcome: This approach only marginally improved the mistake severity. This is surprising, since we generated examples that should help the model to better understand the made mistakes. We expect that the current quality of diffusion models is just not good enough to provide useful examples that enriches the data space for the task of minimizing mistake severity. Other objective such as improving adversarial robustness has improved from using diffusion-based augmentation, but the aim is only to wash out the model's sensitivity to high frequency features; while in our case we require the model to remain sensitive to fine-scale details. We will re-investigate this problem with the next generation of diffusion models.

       
  • P2: 
    • Background: Recent studies on deep ensembles have identified the sharpness of the local minima of individual learners and the diversity of the ensemble members as key factors in improving test-time performance. Building on this, our study investigates the interplay between sharpness and diversity within deep ensembles, illustrating their crucial role in robust generalization to both in-distribution (ID) and out-of-distribution (OOD) data. 
    • Outcome: We discover a trade-off between sharpness and diversity: minimizing the sharpness in the loss landscape tends to diminish the diversity of individual members within the ensemble, adversely affecting the ensemble’s improvement. The trade-off is justified through our theoretical analysis and verified empirically through extensive experiments. To address the issue of reduced diversity, we introduce SharpBalance, a novel training approach that balances sharpness and diversity within ensembles. Theoretically, we show that our training strategy achieves a better sharpness-diversity trade-off. Empirically, we conducted comprehensive evaluations in various data sets (CIFAR-10, CIFAR-100, TinyImageNet) and showed that SharpBalance not only effectively improves the sharpness-diversity trade-off, but also significantly improves ensemble performance in ID and OOD scenarios.
       
  • Impact of research
    • P2 resulted in a paper that is currently under review at NeurIPS.