Scientific Papers

High-Low Frequency Detectors


This article is part of the Circuits thread, an experimental format collecting invited short articles and critical commentary delving into the inner workings of neural networks.



Introduction

Some of the neurons in vision models are features that we aren’t particularly surprised to find. Curve detectors, for example, are a pretty natural feature for a vision system to have. In fact, they had already been discovered in the animal visual cortex. It’s easy to imagine how curve detectors are built up from earlier edge detectors, and it’s easy to guess why curve detection might be useful to the rest of the neural network.

High-low frequency detectors, on the other hand, seem more surprising. They are not a feature that we would have expected a priori to find. Yet, when systematically characterizing the early layers of InceptionV1, we found a full fifteen neurons of mixed3a that appear to detect a high frequency pattern on one side, and a low frequency pattern on the other.

One worry we might have about the circuits approach to studying neural networks is that we might only be able to understand a limited set of highly-intuitive features.

High-low frequency detectors demonstrate that it’s possible to understand at least somewhat unintuitive features.

How can we be sure that “high-low frequency detectors” are actually detecting directional transitions from low to high spatial frequency?
We will rely on three methods:

Later on in the article, we dive into the mechanistic details of how they are both implemented and used. We will be able to understand the algorithm that implements them, confirming that they detect high to low frequency transitions.

A feature visualization is a synthetic input
optimized to elicit maximal activation of a single, specific neuron.
Feature visualizations are constructed starting from random noise, so each and every pixel in a feature visualization
that’s changed from random noise is there because it caused the neuron to activate more strongly. This
establishes a causal link! The behavior shown in the
feature visualization is behavior that causes the neuron to fire:

1:
Feature visualizations of a variety of high-low frequency detectors from InceptionV1′s mixed3a layer.

From their feature visualizations, we observe that all of these high-low frequency detectors share these same
characteristics:

  • Detection of adjacent high and low frequencies. The detectors respond to high frequency on one side, and low frequency on the other side.
  • Rotational equivariance.
    The detectors are rotationally equivariant: each unit detects a high-low frequency change along a particular angle, with different units spanning the full 360º of possible orientations.
    We will see this in more detail when we construct a tuning curve with synthetic examples, and also when we look at the weights implementing these detectors.

We can use a diversity term in our feature visualizations to jointly optimize for the activation of a neuron while encouraging different activation patterns in a batch of visualizations.

We are thus reasonably confident that if high-low frequency detectors were also sensitive to other patterns, we would see signs of them in these feature visualizations. Instead, the frequency contrast remains an invariant aspect of all these visualizations. (Although other patterns form along the boundary, these are likely outside the neuron’s effective receptive field.)

1-2:
Feature visualizations of high-low frequency detector mixed3a:136 from InceptionV1′s mixed3a
layer, optimized with a diversity objective. You can learn more about feature visualization and the diversity objective here.

We generate dataset examples by sampling from a natural data distribution (in this case, the training set) and selecting the images that cause the neurons to maximally activate.

Checking against these examples helps ensure we’re not misreading the feature visualizations.

2:
Crops taken from Imagenet where mixed3a 136 activated maximally,
argmaxed over spatial locations.

A wide range of real-world situations can cause high-low frequency detectors to fire. Oftentimes it’s a highly-textured, in-focus foreground object against a blurry background — for example, the foreground might be the microphone’s latticework, the hummingbird’s tiny head feathers, or the small rubber dots on the Lenovo ThinkPad pointing stick — but not always: we also observe that it fires for the MP3 player’s brushed metal finish against its shiny screen, or the text of a watermark.

In all cases, we see one area with high frequency and another area with low frequency. Although they often fire at an object boundary,

they can also fire in cases where there is a frequency change without an object boundary.

High-low frequency detectors are therefore not the same as boundary detectors.

Tuning curves show us how a neuron’s response changes with respect to a parameter.

They are a standard method in neuroscience, and we’ve found them very helpful for studying artificial neural networks as well. For example, we used them to demonstrate how the response of curve detectors changes with respect to orientation.

Similarly, we can use tuning curves to show how high-low frequency detectors respond.

To construct such a curve, we’ll need a set of synthetic stimuli which cause high-low frequency detectors to fire.

We generate images with a high-frequency pattern on one side and a low-frequency pattern on the other. Since we’re interested in orientation, we’ll rotate this pattern to create a 1D family of stimuli:

The first axis of variation of our synthetic stimuli is orientation.

But what frequency should we use for each side? How steep does the difference in frequency need to be?
To explore this, we’ll add a second dimension varying the ratio between the two frequencies:

The second axis of variation of our synthetic stimuli is the frequency ratio.

(Adding a second dimension will also help us see whether the results for the first dimension are robust.)

Now that we have these two dimensions, we sample the synthetic stimuli and plot each neuron’s responses to them:

Each high-low frequency detector exhibits a clear preference for a limited range of orientations.

As we previously found with curve detectors, high-low frequency detectors are rotationally equivariant: each one selects for a given orientation, and together they span the full 360º space.


How are high-low frequency detectors built up from lower-level neurons?

One could imagine many different circuits which could implement this behavior. To give just one example, it seems like there are at least two different ways that the oriented nature of these units could form.

  • Equivariant→Equivariant Hypothesis. The first possibility is that the previous layer already has precursor features which detect oriented transitions from high frequency to low frequency. The extreme version of this hypothesis would be that the high-low frequency detector is just an identity passthrough of some lower layer neuron. A more moderate version would be something like what we see with curve detectors, where early curve detectors become refined into the larger and more sophisticated late curve detectors. Another example would be how edge detection is built up from simple Gabor filters which were already oriented.

    We call this Equivariant→Equivariant because the equivariance over orientation was already there in the previous layer.

  • Invariant→Equivariant Hypothesis. Alternatively, previous layers might not have anything like high-low frequency detectors. Instead, the orientation might come from spatial arrangements in the neuron’s weights that govern where it is excited by low-frequency and high-frequency features.

To resolve this question — and more generally, to understand how these detectors are implemented — we can look at the weights.

Let’s look at a single detector. Glancing at the weights from conv2d2 to mixed3a 110, most of them can be roughly divided into two categories: those that activate on the left and inhibit on the right, and those that do the opposite.

4:
Six neurons from conv2d2 contributing weights to mixed3a 110.


The same also holds for each of the other high-low frequency detectors — but, of course, with different spatial patternsAs an aside: The 1-2-1 pattern on each column of weights is curiously reminiscent of the structure of the Sobel filter. on the weights, implementing the different orientations.

Surprisingly, across all high-low frequency detectors, the two clusters of neurons that we get for each are actually the same two clusters! One cluster appears to detect textures with a generally high frequency, and one cluster appears to detect textures with a generally low frequency.