Privacy has always been a concern over the internet. A new wave of privacy networks struck the world in 2002 when the TOR Project was released to the public. The core principle of TOR, popularly known as the onion routing protocol, was developed by the ‘United States Naval Research Laboratory’ in the mid-1990s. It was further developed by ‘Defense Advanced Research Projects Agency’. The project that started as an attempt to create a secured communication network for the U.S. Intelligence was soon released as a general anonymous network. These anonymous networks are run with the help of volunteers that serve the physical need of the network, while the software fills up the gaps using encryption algorithms.
Fundamentally, the volunteers along with the encryption algorithms are the network. Once a part of such a network, the identity, and activity of a user is invisible. The users remain completely anonymous over the network if they follow a few steps and rules. As of December 2017, there are more than 3 million TOR users as per the TOR Project’s website. Today, the anonymous web is used by people of all kinds. While, some just want to use it to make sure nobody could possibly spy on them, others are also using it to buy and sell things. Thus, functioning as a censorship-resistant peer-to-peer network.
Through this thesis, we propose a novel approach to identifying traffic and without sacrificing the privacy of the Tor nodes or clients. We recorded traffic over our own Tor Exit and Middle nodes to train Decision Tree classifiers to identify and differentiate between different types of traffic. Our classifiers can accurately differentiate between regular internet and Tor traffic while can also be combined together for detailed classification. These classifiers can be used to selectively drop traffic on a Tor node, giving more control to the users while providing scope for censorship.
Library of Congress Subject Headings
Routing protocols (Computer network protocols); Computer networks--Security measures; Telecommunication--Traffic--Classification; Machine learning
Telecommunications Engineering Technology (MS)
Department, Program, or Center
Electrical, Computer and Telecommunications Engineering Technology (CET)
William P. Johnson
Palsambkar, Siddharth, "Internet and Tor Traffic Classification Using Machine Learning" (2019). Thesis. Rochester Institute of Technology. Accessed from
RIT – Main Campus