The year is 2019. If you somehow manage to teleport to Mars now, you will find a multipurpose spacecraft orbiting Mars in nearly circular orbits lasting about 2 hours, at altitudes ranging from 250 km to 316 km. This spacecraft, aptly named the Mars Reconnaisance Orbiter (MRO), has been transmitting a torrent of data captured by all its state-of-the-art instruments, all the way back to Earth ever since September 2006. The total data received — 361 terabits — is more than all other interplanetary missions combined, past and present. Furthermore, the MRO travels at a speed of 3.4 kilometers per second and has orbited Mars 60,000 times!
Had you asked me about this marvelous contraption a month ago, I wouldn’t have had the faintest idea what you were talking about. However, 3 weeks ago, my understanding of the Martian surface underwent a complete overhaul, owing to the fact that I was selected to be part of Omdena’s Global AI Collaborative Challenge, working with a 50-member team to automate the identification of landing sites and anomalies on the surface of Mars.
As you have probably surmised, we decided to apply machine learning to solve this problem, but before we could begin, we realized that we required large amounts of data (images) of the Martian surface. In order to fetch this data, we needed to understand where it could be found and how it could be retrieved. Thus, we found ourselves wading through specification documents and user manuals of meticulously constructed machines working in tandem with complex software systems; attempting to understand more about the MRO and all its instruments.
If you are in a hurry and only wish to know how these images can be retrieved using a few lines of code (in Python) so that you can focus on building and tuning neural networks to analyse the surface of Mars, then feel free to skip ahead to the “All-in-one Python Package” section. However, if you are curious to know more about how images from Mars reach Earth and the problems that were fixed after downloading those images, then do read on.
Table of Contents
- High Resolution Imaging Science Experiment (HiRISE)
- Deep Space Network (DSN)
- NASA Planetary Data System (PDS)
- Geosciences Node’s Orbital Data Explorer (ODE)
- ODE Web Interface
- ODE REST Interface
- Humungous JPEG2000 Files
- Slicing off chunks
- Finding a Scalable Solution
- Black Margins and Computer Vision Tricks
- Numpy Dataset File
- Task Summary
- All-in-one Python Package
High Resolution Imaging Science Experiment (HiRISE)
The MRO has been providing crucial support for rover and stationary lander missions by capturing different types of images of the Martian surface so that scientists on Earth can manually evaluate each image to identify potential landing sites. Out of the 3 cameras, 2 spectrometers and 1 radar included on the orbiter, we decided to focus on the data captured by the High Resolution Imaging Science Experiment (HiRISE) camera, managed by the University of Arizona, Tucson. The beauty of this camera is that it is capable of achieving resolutions of 1 microradian, or 0.3 meter (equivalent to the size of a dining table) at a height of 300 km. The observations have covered about 2.4 percent of the Martian surface, an area equivalent to two Alaskas, with many locations imaged repeatedly.
Deep Space Network (DSN)
The MRO relays all this data to Earth by communicating with NASA’s Deep Space Network, which is is a worldwide network of U.S. spacecraft communication facilities, located in the United States (California), Spain (Madrid), and Australia (Canberra), that supports NASA’s interplanetary spacecraft missions.
NASA Planetary Data System (PDS)
Now, where does this data go after it reaches Earth? Well, I’m glad you asked. Data from each of these instruments is stored in remote nodes of NASA’s Planetary Data System (PDS), which is a distributed data system that NASA uses to archive data collected by Solar System missions, capable of withstanding the test of time so that future generations of scientists can access, understand and use preexisting planetary data.
Geosciences Node’s Orbital Data Explorer (ODE)
It would be too cumbersome and inefficient to search through nodes scattered across multiple regions just to find an image taken at a particular latitude and longitude. Therefore, another system called the Geosciences Node’s Orbital Data Explorer (ODE) loads PDS archive data from local and remote nodes into a central database, so that it can augment the existing PDS search and retrieval interface by providing advanced search, retrieve and order tools, integrated analysis and visualization tools in a scalable manner.
ODE Web Interface
The ODE Website provides a user-friendly search interface to the Orbital Data Explorer, which allows users to search for images based on different criteria, such as the instrument type, latitude, longitude, etc.
It even supports Map Search, which is equivalent to how we use Google Maps to search for particular places (I hadn’t known that names had already been bestowed upon several regions of Mars!)
ODE REST Interface
However, the web interface isn’t particularly useful when we wish to search for and download large quantities of images (more than 1000) from various regions of Mars. And this is where the ODE REST Interface comes into the picture. It provides programmatic access via HTTP to the same images that are available through ODE’s website. A sample response is as shown below:
Humungous JPEG2000 Files
If you look closely, you will notice URLs to image files instead of the actual images themselves. Besides that, the extension of the image file is not the usual JPG or PNG format that we have grown accustomed to. Instead, it is a strange format called JP2.
JPEG 2000 (JP2) is an image compression standard and coding system. It was created by the Joint Photographic Experts Group committee in the year 2000 with the intention of superseding their original JPEG standard (created in 1992) and is more efficient in compressing high-resolution images. It also offers the possibility of dividing the image into smaller parts to be coded independently and improvement in noise resilience. Overall, JPEG 2000 is a more flexible file format.
Since we are dealing with high-resolution images of the order of thousands of pixels, we have to rely on this advanced JP2 format which is computationally expensive to process and takes huge amounts of memory to store. No wonder this format hasn’t become ubiquitous like the JPEG format in small-scale devices even in 2019.
Slicing off chunks
How on ( ̶E̶a̶r̶t̶h̶) Mars would I build a convolutional neural network that processes thousands of these JP2 files, with each image having a resolution of about 25000 x 40000 pixels and hope for it to complete in my lifetime? Well, we tried fixing this issue by slicing smaller, equally sized chunks (of about 1024 x 1024 pixels) from this gargantuan image and saving each chunk in JPG format so that a CNN could even consider looking at it.
Finding a Scalable Solution
Our troubles had only just begun, for slicing a JP2 file into chunks proved to be a painfully slow process — it took about 1 hour to process 1 image if the python program somehow managed not to crash before that. This was in no way scalable. So, another method of slicing was discovered which didn’t require the entire JP2 file to be loaded into memory. It entailed making use of the rasterio package to fetch only parts of the JP2 image using a sliding-window technique.
Black Margins and Computer Vision Tricks
But this wasn’t the end. Now that we had all the chunks, we browsed through them manually only to discover that multiple images were completely black or had irregular black margins. In fact, around 50% of the chunks were black. On digging deeper, we realized that this was because the image in the JP2 file was map-projected, which resulted in it being laid out along the diagonal of a rectangular region, with black margins all around.
To combat this problem, a few computer vision tricks such as contours and thresholds were applied to identify such black margins; rotate and crop them out. Apart from that, all chunks that were completely black were discarded.
Numpy Dataset File
Finally, although the chunks were being saved separately on disk, we also wished to save these chunks as a collection of numpy arrays so that it could be easily imported as a dataset before training a machine learning model, similar to how libraries like keras provide sample datasets. This was achieved by aggregating all chunks and saving them in a single npz file.
Well, that was quite a mouthful! Let me just summarize the steps we’ve covered so far:
- Create HTTP Query String to fetch images from the ODE REST Interface
- Parse the Query Response XML and extract JP2 URLs
- Download all JP2 images using the URLs obtained in step 2
- Slice each JP2 image into smaller, equally-sized chunks
- Identify black margins in each chunk
- Remove black margins by rotating or cropping them out while retaining the original resolution of that chunk OR discarding the entire chunk if that is not possible
- (Optional) Aggregate all chunks and save them in a single NPZ file
Wouldn’t it be wonderful if all these steps were performed by just installing a package and running a few commands?
All-in-one Python Package
Well, I am extremely pleased to announce that it is indeed possible to perform all these steps using a small python package we have developed, named “mars-ode-data-access”, available at https://github.com/samiriff/mars-ode-data-access .
To install this project, just run the following commands:
pip install git+https://github.com/usgs/geomag-algorithms.git
pip install git+https://github.com/samiriff/mars-ode-data-access.git
It supports multiple query parameters and chunk settings, and internally makes use of some of the functionality provided by the following packages:
In order to use the mars-ode-data-access package in your project, you will have to import the relevant classes:
Identify and define the query parameters required to fetch the JP2 images you require from the Orbital Data Explorer. For instance, the following snippet demonstrates how data from the Phoenix Mars Landing Site can be fetched:
Next, we need to define some settings for each chunk that each downloaded JP2 image will be sliced into. A sample is shown below:
Finally, all we need to do is actually fetch the results and process them to create chunks. This can be done by just running the following piece of code:
And voila! Just sit back and relax while all the heavy-lifting is done for you behind the scenes (It might take about 6 minutes to process each JP2 image, depending on the resolution).
If you wish to play around with these APIs and see how you can save these chunks in NPZ format, you can use this Colaboratory notebook — https://colab.research.google.com/drive/1c-j-DBLksxuvDUHZSdSDqBP87Ua1ET2N
This feat couldn’t have been achieved without the collaborative learning environment that the Omdena AI Challenge provided, with expert advice and guidance from Rudradeb Mitra, Patricio Becerra, Daniel Angerhausen and Michael Burkhardt. I would also like to thank the entire team working on this project, with special shout outs to Conrad Wilkinson Schwarz, Sebastian Laverde, Sumbul Zehra, Murli Sivashanmugam, Marta Didych, Aman Dalmia and Indresh Bhattacharya for all their contributions to this task.
All-in-all, it has proved to be a great learning and enlightening experience for me, and I hope I was able to give you a satisfactory glimpse of the knowledge I grasped during the past 4 weeks, dear Reader! So, the next time you spot the Red Planet in the night sky, don’t just think of it as a tiny speck in the vast wilderness of space. Remember, it is a magnificent data source which anyone around the Earth can use to train neural networks and learn more about the mysteries that lie on and beneath its surface, thereby improving the human race’s chances of setting foot on Mars in the not too distant future…
A Journey from Martian Orbiters to Terrestrial Neural Networks was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.