Visualising 802.11 Probe Requests - Part 1

Posted 4 years ago - 22 min read

The following project was carried out for my University of Birmingham Computer Science M.Sc

The Visualisation and analysis of device footprints through 802.11 probe request frames

Project Area: Networking, Privacy, MAC Randomisation, Tracking, Visualisation
Technologies: Mongo, Kali, Raspberry Pi, Javascript, Python & Flask, D3.js
Mark: Distinction 79%

Download PDF

Abstract

This project is an investigation into 802.11 probe request frames and the traceability concerns held with the data they transmit. Probe requests are a management frame within the Institute of Electrical and Electronics Engineers (IEEE) 802.11 WLAN protocol, linked with network discovery. While Wi-Fi hardware on a device is enabled and an internet connection has not been established, probe requests are periodically broadcast. Probe requests contain the sender’s MAC address and potentially the name of a previously authenticated network (SSID) from the searching devices memory, both as plain text. Consequently devices are vulnerable to tracking. Current locations can be determined through the presence of a broadcasted MAC address to a receiver and previously visited locations can, in theory, be determined based on the SSIDs contained within the frames. To address this issue, operating system patches have been written to randomise the senders MAC address broadcast within probe requests, in order to make device tracking less trivial. However, adoption of these fixes has been limited, and the success varied across device manufacturers. To produce a suite to analyse the current state of this problem a dataset of network discovery packets needed to be collected. Data collection was achieved using a network sniffer built around a Raspberry Pi 3 and a monitor mode capable network adapter. The packet capture script deployed on the sniffer has been written for *nix systems and was executed in conjunction with Aircrack-ng tools. The output datasets were scrubbed and aggregated using the Pandas Python library and then passed to a local MongoDB collection. Lastly a web portal was produced to visualise the tracking concerns, which runs on Pythons Flask web framework. The interactive GUI and data representations within the portal were built in JavaScript, making use of D3.js (Data Driven Documents), Crossfilter.js and DC.js. Over 200,000 relevant packets were passively collected across Birmingham over the course of this project. Within the dataset, around 85% of directed probe requests showed no form of MAC address randomisation and when the SSIDs contained within these packets were cross-referenced with open source wireless network databases, the ability to determine the related devices prior whereabouts was extremely accurate.

Introduction

Overview

According to research carried out by Cisco [1], by 2021 it is projected that there will be around 8.4 Billion Wi-Fi enabled mobile devices and that “Wi-Fi traffic from both mobile devices and Wi-Fi-only devices will account for 49 percent of total IP traffic by 2020”. Each of these devices will be manufactured with a unique networking identifier; a media access control address (MAC address) which is stored in read only memory and cannot be altered (burnt in address). When the 802.11 WLAN protocol was developed in 1997, it adopted the same packet routing logic as wired networks and therefore utilised these hardware addresses. Burnt in MAC addresses were to be broadcast as plain text by devices during active network discovery, in probe request frames, to establish internet connections via network access points. Before the mass adoption of smartphones and other networked devices which stay on our person throughout the day, this quirk of the protocol may have seemed somewhat benign. However, according to Edward Snowdon in his 2014 interview with James Bamford (Wired Magazine) [2], the security services are making use of this and have the “ability to map the movement of everyone in a city by monitoring their MAC address, a unique identifier emitted by every cell phone, computer, and other electronic devices”. Many smartphone owners don’t manually turn off their devices Wi-Fi hardware when leaving the range of an authenticated network. As a result in public areas probe requests are periodically broadcast and device MAC addresses, and potentially the names of their previously trusted networks (SSIDs) are exposed. In 2015 due in part to the privacy concerns brought about by the mass surveillance revelations, mobile operating system (OS) developers released patches in an attempt to fix the issue of broadcasting these unique burnt in addresses, to limit tracking. Software patches were rolled out in Android 6.0 [3] and iOS 8 [4] that would allow the OS to use MAC randomisation in probe requests, where a locally assigned and ever changing MAC address would be broadcast instead. However, Android’s patch was advisory, and its implementation by the wide variety of handset manufacturers was limited. Travis Mayberry of the US Naval Academy states [5] “most Android phones simply do not have this technology enabled, despite the fact that they are running new versions of the operating system that should allow for it” and furthermore “there are many weaknesses in the way randomisation is implemented that make it easy to circumvent. This leaves people with a false sense of security because they think this technology is protecting them from tracking when it is not.” Over 20 years have passed since the inception of the protocol, and in that time the general difficulty in reading wireless network traffic and the expense of the equipment required has plummeted, allowing anyone with the motivations to monitor personal data in a similar way to how the security services do so, and use this data for political or commercial gain. The insights that can be gleaned from discovered devices can be extraordinarily powerful, especially when further steps are taken to correlate these unique addresses to their owners.

Aims & Objectives

This project aims to visualise the information found in the frame body of probe requests and determine the success in which this broadcasted information can be can be used to fingerprint devices and their previously visited locations. To investigate the tracking concerns of the protocol a network sniffing tool was built and a script written to detect and store the current state of probe requests, from observations around Birmingham. Both directed and undirected probe requests were to be collected along with the related management frames of access point beacons and probe responses. To further improve the dataset, decimal degrees coordinates (DD) from a GPS receiver were to be appended to indicate packet collection locations. After data collection, the second phase of the project was data aggregation and its visualisation in an interactive web portal. An analysis could then take place based on the packets and their encapsulated data. From probe requests, the following can be ascertained; The broadcasting devices manufacturer by parsing the MAC address through the organizationally unique identifier (OUI) database. If a directed probe request is transmitted, the targeted endpoint of communication denoted by the SSID will be encapsulated within the frame. These SSIDs can be passed to an open source access point (AP) database such as WiGLE to geolocate networks with the same SSID. For public or free APs, distinct geolocation will prove more difficult as there are likely to be many locations with the same SSID however the commonality in their names can be used to further build a picture of social spots and tastes such as restaurant chains, bars, first class lounges or public spaces. Cross-referencing collected access point beacons with the probe request SSIDs should help to determine with some certainty a matching public network within the physical range of data collection. For private networks such as a home and work networks with more uncommon or unique SSIDs, building a picture of a devices previous location should be more straightforward. The phases of data collection, visualisation through the web portal and overall analysis of probe requests will be covered in further detail throughout this report.

Produce a packet collection tool to observe the current state of network traffic using low cost, off the shelf hardware components.
Visualise the current state of vulnerability of devices broadcasting SSIDs in probe requests to a BIA MAC address.
Contrast and compare manufacturers and their effectiveness to address this problem. Which manufacturers have deployed MAC randomisation?
Test claims that a device can be geolocated through previously connected SSIDs using public Wi-Fi Databases (WiGLE)

Background

Mobile Device Tracking

The ability to track individuals by leveraging radio waves transmitted by their devices, such as smartphones is controversial but becoming increasingly common. The ability to passively monitor individuals has both political and commercial applications. During the Snowdon leaks of 2014, it was ascertained that the NSA could track smartphone locations using MAC addresses and that they were able to use programs such as Co-Traveler to deduce social connections between people [6]. Within policing it has been discovered IMSI catchers (aka. stingrays) have been deployed to monitor protests and activity around government buildings such as the Houses of Parliament, through logging International Mobile Subscriber Identity (IMSI) numbers [7]. Early commercial tracking examples include smart bins in the UK which used Wi-Fi to log the movements of people, with the intention of gaining insights into shopping behavioural patterns [8]. This tracking was made possible through un-associated Wi-Fi enabled devices transmitting probe requests through the 802.11 protocol. Hardware manufacturers such as Apple [9] and Google [10] are currently scaling up the production of ‘beacon’ devices for the commercial sector which monitor Bluetooth transmissions for tracking customer metrics such as loyalty, the rate of return, time spent in-store. Recently it has been found that Google has been using this beacon data to improve the quality of its location services provided by the Android operating system, and even when an Android user has turned Bluetooth off within their device settings, their devices have still been transmitting Bluetooth for communication with these beacons in the background without their knowledge [11].

Network Discovery & 802.11 Management Frames

This project will focus on network management frames within the IEEE 802.11 protocol which are responsible for network discovery. More importantly, the three packet types of probe requests, probe responses, and access point beacons. Network discovery is the process in which a device searches for a wireless network and attains parameters, such as network SSID, channel, supported transfer rate and encryption type.

Probe Requests & Probe Responces (Active Scanning)

When a device’s Wi-Fi Radio is enabled, and an internet connection is not established, the device will periodically broadcast probe requests on a given channel. The device then waits to receive responses on the same channel from APs within range, triggering a connection process with networks which it has previously connected. The response with the highest RSSI will be chosen if multiple access points with the same SSID respond. If a known network is not found within the responses, a new channel is selected, and the process repeated. Over time devices connect to many wireless networks, and the history of these connections are stored within the memory of the device, indefinitely. These retained SSIDs are cycled through and broadcast as the SSID within probe request frames. Probe requests containing the SSID of a trusted network are referred to as a directed. Undirected probe requests differ in that they have a null SSID value (wildcard SSID). A Directed probe request is effectively asking for a network by its name “Where’s Adam?” whereas an undirected probe request is broadcasting into the void for any response “Where’s everyone?”. Devices use a mix of the two to ensure maximum network discovery.

APs listen for directed probe requests, and if they receive a directed probe request which contains the same SSID as their own, then the AP will broadcast a probe response including the enquiring devices broadcast MAC address in the packet header. The probe response frame provides the required information that the device needs to trigger the authentication process on the wireless network. If an AP receives an undirected probe request, then the AP will still respond with a probe response in the format of an AP beacon.

Access Point Beacons (Passive Scanning)

The same data contained within a probe response frame is broadcast roughly every 100ms as an access point beacon, with the exception of a destination address, which is the in the packet header. Instead ff:ff:ff:ff:ff:ff:ff is used to indicate that this packet can be read by any device. This enables clients to connect passively. These frames are used to populate the list of available networks within a devices connection menu. There’s a more significant time interval between AP beacons when compared to the probe response cycle of between 3-20ms [12]. Portable devices battery management schemes prefer the latter as the radio receiver can be on for a shorter period, therefore, using less power. These frames will be collected within the data collection phase of the project to cross-reference locally discovered networks with those contained within directed probe requests.

Probe Request Frame Body

A breakdown of a probe request frame can be seen in Fig 3. For this project, I will be extracting SSIDs to build a picture of where a device has been. In the last and largest block in the probe request frame vendors may optionally transmit additional information they deem useful, although this information isn’t vital to form a connection and can be skipped over by most AP hardware. Through each iteration of the 802.11 standards, the volume of data this block has been growing. In research carried out by the US Naval Academy, it was found that iOS 10 devices transmit identifiers within this block that could compromise privacy. “Inexplicably the addition of an Apple vendor specific IE was added to all transmitted probe requests. This ID made identification of iOS 10 Apple devices trivial regardless of the use of MAC address randomisation.” This IE value “never changes across devices” and therefore can be used to uniquely identify a device, as with a burnt in MAC address. Due to Apple iOS not being open source, it is difficult to find more information on the utility of this IE value. This ID may be useful for determining device counts for Apple devices on the latest version of iOS using MAC Randomisation.

Packet Sniffing Tools

Packet sniffing tools are used by networking engineers, penetration testers and within the field of ethical hacking. Their primarily application is for analysing network traffic and can be used to log packets sent across a wired or wireless connection.

Kali Linux Raspberry Pi & Aircrack-NG

As portability would be a necessity for this project running a packet sniffing tool on a laptop seemed inappropriate. A compact single board computer such as the Raspberry Pi offered a solution to this as it could be driven by a 2 AMP, high capacity battery pack and run a wide variety of Linux distros. From my research, the most suitable Linux distro for the task was Kali Linux, a build designed for penetration testers which came with preloaded with a series of tools for network monitoring such as Wireshark and Aircrack-ng which would allow modification of settings for the Wi-Fi hardware used by the device. The two Raspberry Pi models that appeared suitable for the task were the Raspberry Pi 3 B+ or the Raspberry Pi Zero W. Upon further study I found the onboard Wi-Fi module on each board was incapable of being set into monitor mode and lacked sufficient range; therefore I would need an external Wi-Fi adapter. As GPS data was to be appended to the dataset, the device would require three USB interfaces on the device, one for power, Wi-Fi and GPS. Ruling out the Raspberry Pi Zero W with only a single interface after power. A GPS module running off soldered GPIO headers on a Raspberry Pi Zero W could be used; however, this would remove the option to run a TFT screen to the device, which was desired as it would eliminate the need to SSH into the Pi in order to execute the packet collection script, further ruling out this device. Other opportunities were explored such rooting an old Nexus 5 phone, but flexibility would have been lost with this option in hardware compatibility, and only one OTG USB interface would be available with no GPIO.

Monitor Mode (Airmon-NG)

802.11 packet headers contain the senders, and the intended recipients MAC address. Wi-Fi adapters only parse packets to the OS if the packet headers receipt address is a public broadcast address (ff:ff:ff:ff:ff:ff used in access point beacons or undirected probe responses) or it matches its own MAC address (either BIA or Randomised). Monitor mode circumvents this, parsing all traffic regardless of the recipient address within the packet header to the OS. This mode can be engaged with Airmon-ng a program in the Aircrack-ng suite, provided that the Wi-Fi hardware used is running on one of the following chipsets [17].

Atheros AR9271
Ralink RT3070
Ralink RT3572
Realtek 8187L
Realtek RTL8812AU

When monitor mode is running on a network adapter, the ability to send packets is halted, turning the device purely into a passive listening device. Monitor mode limits the ability for collected data to be submitted to a database live unless a second network adapter is running out of monitor mode (station mode).

Radio Channel Switching (Airodump-NG)

The predominant IEEE 802.11 b, g and n standards operate on 2.4GHz (2G) and the newer 802.11 ac on 5GHz (5G) frequencies. Wi-Fi traffic transmitted on the 2.4GHz frequency band is comprised of 14 channels 5Mhz apart. Channels 1-13 are available for public use. Channel 14 is an unlicensed frequency in the UK and broadcasting on this frequency is illegal, so for this project, it can be ignored. Broadcasting Wi-Fi requires 20- 22MHz of bandwidth to operate without interference. The generally accepted convention is to use channels 1, 6 and 11 on consumer devices. This convention can be seen below in the 802.11 b standards with 22mhz channel bandwidth requirements.

This convention is demonstrated but also revealed that not all hardware necessarily abides by it. Scanning packet transmission on all 13 channels is therefore essential to ensure a complete depiction of the problem is established and that devices and access points are not excluded.

To enable backwards compatibility with devices 5 GHz APs are often equipped with two radios, one transmitting on each frequency. I will, therefore, assume for this project that producing a scanner operating within the 2.4 GHz range will be acceptable. Channel switching the 2.4 GHz frequency can be achieved running the airodump-ng program from the aircrack-ng suite in a separate terminal window. Channels will be selected for 200ms and then incremented using this tool.

Ran using terminal command: airodump-ng [monitor mode adapter ie. wlan0mon]

Mac Addresses

MAC addresses are unique 48-Bit Global Identifiers (EUI-48) represented as six blocks of two hyphen- separated hex values (ex. A1-B2-C3-D4-E5-F6) these are assigned to networking hardware during manufacturing. These values cannot be changed as they are hardcoded in read-only memory. They may, however, be overridden or spoofed in the devices OS. Their role is to act as the endpoint of communication in the physical layer of the internet protocol. IPv6 has introduced mac address of 64-Bit length (EUI-64) to increase the finite number of address spaces offered by 48-Bit addresses and to future-proof the technology. The current address space currently stands at 281,474,976,710,656.

The first three blocks (octets) of a MAC address are responsible for identifying the manufacturer of the wireless hardware. These namespaces are purchased from the IEEE and are referred to as the Organisationally Unique Identifier (OUI). OUI’s are public record and can be checked against OUI databases [13] to determine manufacturers from a plain hex address. Within the first octet of the OUI is a control bit located at the 7th bit. This bit determines if the MAC address is using a namespace purchased off the IEEE or if the OS has locally assigned the MAC address see Fig 6. The 7th bit is vital for identifying randomised addresses that have been locally administered by the OS to disguise the devices official BIA.

MAC Randomisation

To prevent third parties from tracking devices through their burnt in MAC address, OS developers have released software patches to implement a fix called MAC randomisation. Gruteser [14] first proposed this idea; his suggestion was to use temporary MAC addresses for probe requests to improve user privacy. The implications of this would be that BIA MAC addresses would no longer be broadcast, and a new throwaway MAC address would be used for each probe burst, and if that address hadn’t been used in establishing a connection, it would then be forgotten. However, no IEEE specification on MAC randomisation has currently been established, resulting in OS developers implementing their unique solutions to the problem with varying degrees of success.

Apple brought MAC randomisation to its devices from iOS 8 onwards. In iOS 8, randomised MAC addresses are used when a device is un-associated with an AP and when the device is locked although Apple are yet to bring this to MacOS. Apple will generate locally administered MAC addresses within their own and all unlicensed OUI name spaces. Google introduced MAC randomisation to Android Marshmallow 6.0, but device manufacturers can only implement it if their chosen networking hardware’s chipsets support it. As Android runs on a far more varied hardware base, this has been very difficult to enforce. In the soon to be released Android Pie 8.0 Google have announced a new experimental feature in the developer tools of the OS called ‘Connected MAC Randomisation’ which when running will generate a distinct MAC address for each network you have an authenticated connection. At this time, however, it is unclear whether this feature will address the privacy issues the previous implementation had yet to fix. Google is also still calling this an experimental feature, and with ‘Connected MAC Randomisation’ being situated in the developer’s tools section of the OS it will not likely be an out of the box feature of the OS until a later release.

DA:A1:19 MAC OUI Prefix

If a device is implementing randomisation it is still not permitted to randomise into the reserved name spaces of other OUI registered companies. Google Inc. owns the DA:A1:19 OUI prefix and this is of importance when looking for MAC randomisation. MAC addresses starting with this prefix will be running Googles Android OS and will be implementing a randomisation patch. Devices which send probe requests starting DA:A1:19 will randomise only the last NIC portion of their address when sending subsequent probes. This is unlike Apple’s approach where all Apple devices running randomisation use the Apple namespace and all unregistered spaces making them slightly harder to pin down. In research carried out by The US Naval Academy in March 2017, when cross referencing the DA:A1:19 prefix with device WPS attributes they were able to determine Huawei, Sony, Blackberry, HTC, Google and LG smartphones were using this prefix for MAC Randomisation. The methods used to identify these manufactures required active network deauthorisation and therefore will not be used in this project. It is worth looking out for this address and understanding that OUI checks may under report Android manufacturers due to this umbrella MAC address.

Wardriving Wi-Fi

Wardriving is the name given to act of discovering and indexing wireless networks by individuals, usually from a moving car via a smartphone or laptop [15]. Individuals who take part in wardriving use devices with Wi-Fi and GPS to log the information contained within access point beacon frames, such as SSIDs, along with coordinates. Discovered networks are uploaded to an open source database, which all participants can view. Examples of these projects are WiGLE, openBmap and Geomena, with WiGLE by far being the most active with around 471 million Wi-Fi networks indexed. WiGLE allows individuals to participate in wardriving through an Android app, and directly upload data collected by the phones onboard hardware.

Device Model Classification Using Generated Signatures

Using MAC address OUI prefixes gives the ability to determine the manufacturer of non-randomised MAC addresses however the ability to further segment this data into device models requires identifiers to be transmitted within the frames that are unique to the device. Based on researched demonstrated by Denton Gentry at this year’s DEF CON 25, this can be achieved through producing a database of signatures, built off the presence and absence of data transmitted in the vendor specific portion of the probe request frame [16]. Furthermore newer devices actually transmit more information in this portion that older devices so it may actually be getting easier not more difficult to differentiate phones models using this method over time. This would however be more challenging to implement in this project as it is not as simple as a company check and would require significant work in producing a signature database of all of the devices on the market and ensuring a correct unique signature has been built that works across multiple devices and not just the one being observing. This research is in it’s infancy at this point in time said database doesn’t exist. String signatures would be generated from supported channel rates, vendor specific information (such as the Apple IE number) and up to three further OUIs within the vendor portion which will be related to the Wi-Fi radio hardware manufacturers such as Broadcom.