Hello there! This is the second in a series of blogposts detailing the work I will be doing for the Tor Project as part of Google Summer of Code 2023.
Here I will detail some of the work I did during some period of time, the challenges I faced and the outcomes of that work.
The project I was selected for is titled “Arti API exploration to build example tools”.
Confused? Let’s break it down.
Arti is the Rust rewrite of the Tor software that allows you to make TCP connections through the Tor Network, run Tor relays and bridges etc. You can read more about it from its repository
Arti was built in mind with the goal that other developers should be able to use Arti to build Tor-powered software, and has incorporated that thinking into its design.
So, it exports some crates that other developers can use in their Rust projects, and has some documentation on them, including some basic demonstration code snippets that you can follow along with at home.
However, Arti is a fairly bleeding-edge project. It didn’t hit version 1.0.0 too long ago, and due to the breakneck speed of development, APIs are not set in stone. There is a lot of breakage that could be potentially encountered by another developer.
In this project, I will be creating certain sample programs in Rust using Arti’s APIs
My goal will be to build my sample programs and document any difficulties that come up.
In this way, the project can get valuable feedback from an outsider who doesn’t have much knowledge of the codebase and the way the Arti proxy does things.
In the last post, I detailed how I got a functioning DNS packet to be crafted with the help of Wireshark and a teaching resource I found.
This time, I wanted to understand the meanings behind the “magic numbers” I was sending. For this, Ian Jackson suggested I read RFC 1035, all the way from 1987, which lays down the DNS protocol in its entirety.
RFCs can be a bit scary since they need to very precisely define protocols, edge cases and all, however 1035 was very readable. The scope of this project means that I don’t have to build a full-fledged DNS resolver, but can get away with limiting myself a bit (eg. to lookup A records only), but the RFC explaned how in DNS over TCP, there are 2 bytes before the entire message which store the length of the DNS message (payload and headers).
So, in the last example, what was actually happening was that
0x00 0x33
stored the info that “this DNS packet will be of
51 bytes”, and by padding the entire message so that it was 53 bytes in
length (51 bytes for the message + 2 bytes to store the length), I was
validating this statement. Hence why any change in these two bytes or
not padding to 53 bytes was invalidating my packet in Wireshark.
Some other things I fixed were:
There were some severe discrepancies in the types of certain
attributes, like the fact that NAME in Response was not being stored as
a Vec<u8>
or that the TTL was not an
i32
. These were also addressed to comply with
standards.
Some Response parsing code was added so that more fields were stored correctly, along with some checks on these values so that we get a correct response code, or to store all the IP addresses that have been sent to us, or to reconstruct the domain name from the Response bytes.
We can take a command line argument now of what hostname we wish to look up.
Display
trait was implemented so that we can print
the header and response to stdout
nicely, with most fields
in hexadecimal representation
After this, I just some minor clean up work before switching to the connection checker to make some progress there.
One feature of the connection checker I’d proposed is the ability to see the current circuit being used by Arti.
Here, circuit refers to the three hops that your traffic will take through the Tor Network before being sent to your destination. You can view this in Tor Browser by clicking on the padlock next to the address bar. It shows you the IP addresses of the nodes and the country in which they reside. If you are connected using a bridge, it shows that instead as well as the specific pluggable transport you may be using.
I asked about how we could implement this feature using Arti, and here is where I learned something new: Tor Browser simplifies.
Tor creates multiple circuits when it initializes, both to help meet the need of multiple circuits later on down the line and because not every stream can be routed over one circuit.
Here, a stream is a DataStream
as exported by
arti-client
(ie, analogous to
TcpStream
)
In browsers, one page usually loads in resources from other
domains or different files on the same server. This means different
DataStream
s for different resources and thus different
circuits.
So, if example.com
loads in
cdn.example.com/image.jpg
, for example, both these requests
may occur over different circuits entirely, and so the Tor Browser
circuit info is misleading since it won’t depict all these circuits
being used to render the page.
Now, Arti can give you the connection info for a particular
DataStream
, but it will require the
experimental-api
crate feature. After some finagling with
Cargo.toml
, I got the required API running using
experimental
crate feature instead (perhaps because some
other API that I indirectly called was using it?)
After that, it was just a matter of making a dummy request and printing the circuit that was used for that.
During all of this, I was also trying to setup Snowflake connections
in the connection checker as well as the Arti proxy. For the proxy, it
is done by configuring arti.toml
, which lives in
~/.config/arti
.
A sample arti.toml is provided in the Arti repo for reference. It also contained instructions on how to configure a bridge.
However, in a rush to configure Arti, I skipped over reading some of these instructions and had created an invalid config.
You see, in the TOML, you’re supposed to configure bridges by writing the following:
# Copied this and simplified from Arti's sample config, the below bridge is a dummy one and won't work
[bridges]
bridges = [
"obfs4 bridge.example.net:80 $0bac39417268b69b9f514e7f63fa6fba1a788958 ed25519:dGhpcyBpcyBbpmNyZWRpYmx5IHNpbGx5ISEhISEhISA iat-mode=1",
]
[[bridges.transports]]
# Which pluggable transports does this binary provide?
protocols = ["obfs4"]
# Path to the binary to be run.
path = "/usr/bin/obfsproxy"
Note: I try to explain what all this means as best I can, but if some of the jargon doesn’t seem clear to you, I recommend reading the Tor Project’s guide on censorship circumvention for more details.
The [bridges]
section holds the bridge descriptor, which
is basically that long line of text which contains what type of bridge
it is, the IP:port and some authentication info so that we can be sure
we actually are connected to the bridge and not some impostor.
The [[bridges.transports]]
section houses some more
additional info about pluggable transports, which are the different
protocols that Tor can use to connect to a bridge and in the process
defeat censors which block Tor connections. There are various bridge
types, including obfs4
(like the one given above) and
Snowflake
.
We can see in this section we have to give info on where the pluggable transport binary is located and what type of pluggable transport this is. This is because pluggable transports aren’t a part of the Tor binary, or Arti. They are maintained and distributed separately. For instance, the CLI version of Snowflake is written in Go.
So now comes the explaination of what I did wrong:
I didn’t configure [[bridges.transports]]
at all,
because I didn’t even see it, I just pasted a snowflake bridge into the
[bridges]
section.
I also ignored the fact that Arti doesn’t have any code that
actually implements connecting to bridges, the tor-ptmgr
crate is just to call external pluggable transports and provide a Rust
interface to manage them.
Thanks to help from trinity and Diziet, who helped me identify these issues, I got Arti to connect to a pluggable transport.
However, I am actually glad I made these mistakes, since it highlighted what someone else who is new to Arti may have done as well. In particular, some weaknesses in the Arti docs (and one in the code) were brought to our attention and some issues and corresponding MRs were created to rectify these.
These were:
In addition, I also opened up the following MR to try to let the user know that pluggable transports weren’t setup right as a warning. (This MR is in a draft stage at the time of writing and may have changed by the time you click on it): arti!1229
Also, while writing example code for Snowflake bridges, trinity spotted a bug in Arti.
Overall, this was a good week, I learned a lot about DNS, read an RFC
for the first time and even helped contribute to Arti, which involved
using a lot of info!
messages and grep
to
understand the flow of control in Arti.
This website was made using Markdown, Pandoc, and a custom program to automatically add headers and footers (including this one) to any document that’s published here.
Copyright © 2024 Saksham Mittal. All rights reserved. Unless otherwise stated, all content on this website is licensed under the CC BY-SA 4.0 International License