Congress Visualizations & Vote Clustering

What is this?

This is my attempt to use data visualization and machine learning to see the patterns in our legislators' voting behavior. I'd like to explore who is voting similar to each other, and who the outliers are. Ultimately, I'd like to see this or a similar project being used to do the kind of visual reporting on our governance that we're used to seeing applied to the sports teams on ESPN.

What is here?

Overview of Senate and Assembly of State of California
Recent bills, updated April 15, 2018
Overview of Senate and House of Federal Government, USA
Recent bills, updated April 15, 2018
These are my reports on what I've gathered and formed into a presentation so far. I've been working with USA federal data and California legislators.

How to serve this page and the links

Run python serve.py from this directory.

The big list of people and bills

This is one large table for senate, and house, (warning these are huge. The cooler stuff is below), with people listed on the left, and bills across the top.

White, if they voted yes, and the bill passed
Light grey, if they voted yes, but the bill failed
50% grey if they didn't vote.
Dark grey, if they voted no, and the bill passed
Black if they voted no, and the bill failed

There are different approaches to recording this data. Here's a few I can think of:

It comes from the source as strings "Yes", "No", and "Not Voting". This isn't going to work with most of our algorithms. Although for the big table it isn't so bad. D3 can handle strings. But we'll probably adjust it so we can keep the same data files throughout the process.
The values in the data file are 1=voted yes, 0=voted no. Each bill has a 2nd row, which is the 'success' of the person's vote. So that row will have 1=voter's vote matches outcome, or 0=voter's vote is opposite outcome.
The values for a vote are from 0.0 to 1.0.
- 0.0 = no, and bill failed
- 0.25 = no, and bill passed
- 0.50 = no vote
- 0.75 = yes, and bill failed
- 1.0 = yes, and bill passed What would be the advantage of this? Any? Values are constrained and good for ML.

Map of the country with the representatives overlaid as a grid

The goal here is the summarize the 400ish members of congress visually, by approximating their location and displaying their vote (and success) by color. There's room on screen for the title and summary of the bill, and how the voting went.

Another view is similar, but color coded by "cluster", and highlighting which clusters tend to have voted for the bill.

Clustering with PCA

PCA (principal component analysis) is a way to perform dimensionality reduction. That is, take a data set in many dimensions, and shrink the number of dimensions, while still keeping the original relationships between the points as intact as possible.

In this case, each legislator (person) is a data point, and the number of dimensions is "how many bills they voted on". Since there's a lot of bills, this is a high dimensional space. The goal here was to reduce that space down to two dimensions, which we could then see on a graph.

This is the cool stuff

These plots of the Senate and House are the results of using PCA to do this. What we're looking at is who voted similar to each other?. So the closer two people are, the more similarly they voted. You can see how Republicans are clustered in one area, and Democrats in another. You can also see the main party "cluster" and the outliers -- those who don't always vote with the crowd.

California: Assembly and Senate.

All bio data is from wikipedia. I rushed it, and you will see some mistaken identities if you look too closely.

Autoencoder

An autoencoder is a variation of a neural network. We use it in a weird way, basically to train itself to produce an output equal to the input. Which isn't very interesting, except for the fact that the layers start at n= # of input bits, then squish down to some much smaller number, and then go back up to a number of output nodes equal to n again. So inside the neural net, it's getting trained to represent the input with much fewer bits than it originally took. This accomplishes some dimensionality reduction which is so useful for graphich high-dimensional data sets on two axes.

I've used an autoencoder to display the voting behavior of the Senate and the House.

While finding the autoencoder settings, I ended up creating these sets of graphs, for the house and senate. They show my attempts at getting the right parameters for the autoencoder. Since it's a neural network of anywhere from 3 to n layers, and it's unclear what the best arrangement is, I began doing lots of variations. Each result has a "fit" result, but what I was really looking for was an interesting arrangement of my data points (congresspeople) in a way that led to insight about their behavior. What you'll see here is small graphs of each of my tries. This is the result of a few rounds of guess-and-check, as I removed layers that didn't work, and added variations of ones that did. Surely there are more robust ways of doing this, but my simple version got decent results.

phowell@gavilan.edu