
ACDC: A Structured Efficient Linear Layer
The linear layer is one of the most pervasive modules in deep learning r...
read it

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps
Modern neural network architectures use structured linear transformation...
read it

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations
Fast linear transforms are ubiquitous in machine learning, including the...
read it

A Deterministic Sparse FFT for Functions with Structured Fourier Sparsity
In this paper a deterministic sparse Fourier transform algorithm is pres...
read it

On the Expressive Power of Deep Fully Circulant Neural Networks
In this paper, we study deep fully circulant neural networks, that is de...
read it

ButterflyNet: Optimal Function Representation Based on Convolutional Neural Networks
Deep networks, especially Convolutional Neural Networks (CNNs), have bee...
read it

Deep Learning of Constrained Autoencoders for Enhanced Understanding of Data
Unsupervised feature extractors are known to perform an efficient and di...
read it
Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice
Fast Fourier transform, Wavelets, and other wellknown transforms in signal processing have a structured representation as a product of sparse matrices which are referred to as butterfly structures. Research in the recent past have used such structured linear networks along with randomness as preconditioners to improve the computational performance of large scale linear algebraic operations. With the advent of deep learning and AI and the computational efficiency of such structured matrices, it is natural to study sparse linear deep networks in which the location of the nonzero weights are predetermined by the butterfly structure. This work studies, both theoretically and empirically, the feasibility of training such networks in different scenarios. Unlike convolutional neural networks, which are structured sparse networks designed to recognize local patterns in lattices representing a spatial or a temporal structure, the butterfly architecture used in this work can replace any dense linear operator with a gadget consisting of a sequence of logarithmically (in the network width) many sparse layers, containing a total of near linear number of weights. This improves on the quadratic number of weights required in a standard dense layer, with little compromise in expressibility of the resulting operator. We show in a collection of empirical experiments that our proposed architecture not only produces results that match and often outperform existing known architectures, but it also offers faster training and prediction in deployment. This empirical phenomenon is observed in a wide variety of experiments that we report, including both supervised prediction on NLP and vision data, as well as in unsupervised representation learning using autoencoders. Preliminary theoretical results presented in the paper explain why training speed and outcome are not compromised by our proposed approach.
READ FULL TEXT
Comments
There are no comments yet.