Nonchalant Guidance

Added on: Monday, 12 June, 2023 | Updated on: Monday, 21 October, 2024

GSoC 2023 Blog 2

Hello there! This is the second in a series of blogposts detailing the work I will be doing for the Tor Project as part of Google Summer of Code 2023.

Here I will detail some of the work I did during some period of time, the challenges I faced and the outcomes of that work.

Brief Intro on My Project

The project I was selected for is titled “Arti API exploration to build example tools”.

Confused? Let’s break it down.

Arti is the Rust rewrite of the Tor software that allows you to make TCP connections through the Tor Network, run Tor relays and bridges etc. You can read more about it from its repository
Arti was built in mind with the goal that other developers should be able to use Arti to build Tor-powered software, and has incorporated that thinking into its design.

So, it exports some crates that other developers can use in their Rust projects, and has some documentation on them, including some basic demonstration code snippets that you can follow along with at home.

However, Arti is a fairly bleeding-edge project. It didn’t hit version 1.0.0 too long ago, and due to the breakneck speed of development, APIs are not set in stone. There is a lot of breakage that could be potentially encountered by another developer.
In this project, I will be creating certain sample programs in Rust using Arti’s APIs

My goal will be to build my sample programs and document any difficulties that come up.

Maybe certain APIs are hard to use, or undocumented, or certain operations cause Arti to fail (exposing a bug). All these issues will be brought to the notice of the Arti team and fixes can be discussed and implemented.

In this way, the project can get valuable feedback from an outsider who doesn’t have much knowledge of the codebase and the way the Arti proxy does things.

Fixing DNS client

In the last post, I detailed how I got a functioning DNS packet to be crafted with the help of Wireshark and a teaching resource I found.

This time, I wanted to understand the meanings behind the “magic numbers” I was sending. For this, Ian Jackson suggested I read RFC 1035, all the way from 1987, which lays down the DNS protocol in its entirety.

RFCs can be a bit scary since they need to very precisely define protocols, edge cases and all, however 1035 was very readable. The scope of this project means that I don’t have to build a full-fledged DNS resolver, but can get away with limiting myself a bit (eg. to lookup A records only), but the RFC explaned how in DNS over TCP, there are 2 bytes before the entire message which store the length of the DNS message (payload and headers).

So, in the last example, what was actually happening was that 0x00 0x33 stored the info that “this DNS packet will be of 51 bytes”, and by padding the entire message so that it was 53 bytes in length (51 bytes for the message + 2 bytes to store the length), I was validating this statement. Hence why any change in these two bytes or not padding to 53 bytes was invalidating my packet in Wireshark.

Some other things I fixed were:

There were some severe discrepancies in the types of certain attributes, like the fact that NAME in Response was not being stored as a Vec<u8> or that the TTL was not an i32. These were also addressed to comply with standards.
Some Response parsing code was added so that more fields were stored correctly, along with some checks on these values so that we get a correct response code, or to store all the IP addresses that have been sent to us, or to reconstruct the domain name from the Response bytes.
We can take a command line argument now of what hostname we wish to look up.
Display trait was implemented so that we can print the header and response to stdout nicely, with most fields in hexadecimal representation

After this, I just some minor clean up work before switching to the connection checker to make some progress there.

Printing Current Circuit Info

One feature of the connection checker I’d proposed is the ability to see the current circuit being used by Arti.

Here, circuit refers to the three hops that your traffic will take through the Tor Network before being sent to your destination. You can view this in Tor Browser by clicking on the padlock next to the address bar. It shows you the IP addresses of the nodes and the country in which they reside. If you are connected using a bridge, it shows that instead as well as the specific pluggable transport you may be using.

I asked about how we could implement this feature using Arti, and here is where I learned something new: Tor Browser simplifies.

Tor creates multiple circuits when it initializes, both to help meet the need of multiple circuits later on down the line and because not every stream can be routed over one circuit.
Here, a stream is a DataStream as exported by arti-client (ie, analogous to TcpStream)
In browsers, one page usually loads in resources from other domains or different files on the same server. This means different DataStreams for different resources and thus different circuits.
So, if example.com loads in cdn.example.com/image.jpg, for example, both these requests may occur over different circuits entirely, and so the Tor Browser circuit info is misleading since it won’t depict all these circuits being used to render the page.

Now, Arti can give you the connection info for a particular DataStream, but it will require the experimental-api crate feature. After some finagling with Cargo.toml, I got the required API running using experimental crate feature instead (perhaps because some other API that I indirectly called was using it?)

After that, it was just a matter of making a dummy request and printing the circuit that was used for that.

Contributing to Arti

During all of this, I was also trying to setup Snowflake connections in the connection checker as well as the Arti proxy. For the proxy, it is done by configuring arti.toml, which lives in ~/.config/arti.

A sample arti.toml is provided in the Arti repo for reference. It also contained instructions on how to configure a bridge.

However, in a rush to configure Arti, I skipped over reading some of these instructions and had created an invalid config.

You see, in the TOML, you’re supposed to configure bridges by writing the following:

# Copied this and simplified from Arti's sample config, the below bridge is a dummy one and won't work
[bridges]
bridges = [ 
     "obfs4 bridge.example.net:80 $0bac39417268b69b9f514e7f63fa6fba1a788958 ed25519:dGhpcyBpcyBbpmNyZWRpYmx5IHNpbGx5ISEhISEhISA iat-mode=1",
]

[[bridges.transports]]

# Which pluggable transports does this binary provide?
protocols = ["obfs4"]

# Path to the binary to be run.
path = "/usr/bin/obfsproxy"

Note: I try to explain what all this means as best I can, but if some of the jargon doesn’t seem clear to you, I recommend reading the Tor Project’s guide on censorship circumvention for more details.

The [bridges] section holds the bridge descriptor, which is basically that long line of text which contains what type of bridge it is, the IP:port and some authentication info so that we can be sure we actually are connected to the bridge and not some impostor.

The [[bridges.transports]] section houses some more additional info about pluggable transports, which are the different protocols that Tor can use to connect to a bridge and in the process defeat censors which block Tor connections. There are various bridge types, including obfs4 (like the one given above) and Snowflake.

We can see in this section we have to give info on where the pluggable transport binary is located and what type of pluggable transport this is. This is because pluggable transports aren’t a part of the Tor binary, or Arti. They are maintained and distributed separately. For instance, the CLI version of Snowflake is written in Go.

So now comes the explaination of what I did wrong:

I didn’t configure [[bridges.transports]] at all, because I didn’t even see it, I just pasted a snowflake bridge into the [bridges] section.
I also ignored the fact that Arti doesn’t have any code that actually implements connecting to bridges, the tor-ptmgr crate is just to call external pluggable transports and provide a Rust interface to manage them.

Thanks to help from trinity and Diziet, who helped me identify these issues, I got Arti to connect to a pluggable transport.

However, I am actually glad I made these mistakes, since it highlighted what someone else who is new to Arti may have done as well. In particular, some weaknesses in the Arti docs (and one in the code) were brought to our attention and some issues and corresponding MRs were created to rectify these.

These were:

In addition, I also opened up the following MR to try to let the user know that pluggable transports weren’t setup right as a warning. (This MR is in a draft stage at the time of writing and may have changed by the time you click on it): arti!1229

Also, while writing example code for Snowflake bridges, trinity spotted a bug in Arti.

Conclusion

Overall, this was a good week, I learned a lot about DNS, read an RFC for the first time and even helped contribute to Arti, which involved using a lot of info! messages and grep to understand the flow of control in Arti.

This website was made using Markdown, Pandoc, and a custom program to automatically add headers and footers (including this one) to any document that’s published here.