1 | Debadeepta Dey

What Makes Convolutional Models Great on Long Sequence Modeling?

Convolutional models have been widely used in multiple domains. However, most existing models only use local convolution, making the model unable to handle long-range dependency efficiently. Attention overcomes this problem by aggregating global …

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

Knowledge distillation (KD) methods compress large models into smaller students with manually-designed student architectures given pre-specified computational cost. This requires several trials to find a viable student, and further repeating the …

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints like peak …

A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in such areas as transportation, healthcare, and industrial automation. We address the opportunity …

Efficient forward architecture search

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers. The added shortcut connections effectively perform gradient boosting on the augmented layers. The proposed …

Anytime neural networks via joint optimization of auxiliary losses

This work considers the trade-off between accuracy and test-time computational cost of deep neural networks (DNNs) via mph{anytime} predictions from auxiliary predictions. Specifically, we optimize auxiliary losses jointly in an mph{adaptive} …

Overcoming blind spots in the real world: Leveraging complementary abilities for joint execution

Simulators are being increasingly used to train agents before deploying them in real-world environments. While training in simulation provides a cost-effective way to learn, poorly modeled aspects of the simulator can lead to costly mistakes, or …

Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention

We present Vision-based Navigation with Languagebased Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a …