- The list of candidate twitter users was generated by aggregating all the users followed by these accounts: dailykos, thinkprogress, HuffingtonPost, voxdotcom, nytimes, washingtonpost, politico, USATODAY, StephensWSJ, WSJ, arthurbrooks, EWErickson, nypost, BreitbartNews, RealAlexJones. Adding (or even removing some of them) other news outlets (whether left-leaning or right-leaning) did not really change the shape and size of the end result.
- Out of these users, only accounts having a follower count between 20,000 and 300,000 were considered.
- Users were then compared to each other by using the Jaccard Index with their sets of who is following them. A higher score means that they have a lot of followers in common with regards to their number of followers.
- In order to reduce noise only the top 1% scores were kept.
- In order to remove outliers, accounts which had scores with less than 15 other accounts (out of the top 1%) were removed.
- The remaining users and their scores were plotted with a force-directed graph algorithm, where users with scores attracted each other proportional to their scores, a force which would fight against a baseline repulsive force present between all users. We then end up with the following graph.
- Three main groups emerge. The two on the right are the ones which ended up being studied. The one of the left is mainly composed of House Representatives and Senators. The two groups were roughly selected using a lasso tool and then re-clustered using the same force directed graphs and filters from above.