Data Analysis, Machine Learning & High-Energy Gamma-Ray Astronomy
Gamma-ray astronomy studies the most energetic part of the electromagnetic spectrum and correspondingly requires large telescope systems and state-of-the-art analysis techniques.
The Cherenkov Telescope Array (CTA) is the next-generation gamma-ray observatory and will play an important role to drive this research forward, marking the beginning of a new era of gamma-ray astronomy.
CTA will exceed current experiments in a multitude of aspects: With more than 100 telescopes of 3 sizes at 2 locations equipped with state-of-the-art technologies, it will provide a new view of the sky at energies of up to 300 TeV. It measures Cherenkov radiation, emitted by extended air showers in the Earth’s atmosphere that have been induced by gamma rays and protons.
This indirect measurement requires advanced analysis techniques to derive the energy, direction and particle type of the incident primary. The branches Big Data and Data Mining are increasingly becoming an integral part in astroparticle physics – and thus also in CTA. Immense amounts of data need to be processed with the most modern and advanced techniques from machine learning, statistics and computer science, demanding close collaborations between the branches.
Various projects involve the investigation of CTA’s performance using different analysis methods. Additional aspects of these studies will be to examine the performance at the highest gamma-ray energies >10 TeV, which are motivated by searching for the accelerators of the highest energetic cosmic rays in our Galaxy, and by improving the energy resolution of the telescopes for dark matter searches.
In this regard, some exemplary projects might be the following:
- Signal Extraction Studies: One of the first steps within the data analysis pipeline is the extraction of the signal from a pixel’s waveform, which can be defined by the charge and the arrival time. Simple algorithms are considering each pixel independently, while more advanced algorithms are also taking into account information from other pixels. In this way, the probability to extract e.g. photons from the night sky background can be reduced. The aim of this project is to study these algorithms by implementing those, adjusting the settings, and comparing them. Their capability can be studied e.g. dependent on the level of night sky background.
- Image Cleaning Studies: Another first step within the data analysis pipeline comprises the cleaning of the camera image from pixels dominated by the night sky background. Simple algorithms select pixels above a specific threshold on the number of photons. More advanced algorithms are also taking into account the arrival time of the photons. The aim of this project is to study more advanced algorithms by implementing those, adjusting the settings, and comparing them. Their capability can be studied e.g. dependent on the level of night sky background or towards the highest energies.
- Combination of Multiple Image Cleaning Algorithms: The cleaning of the camera image is a sensitive task, and different subsequent analysis steps might require different image cleaning algorithms or settings. The aim of this project is to apply multiple image cleaning algorithms, and derive a bulk of image parameters, which can be further combined to higher-level parameters. These parameters are perfectly suited for machine learning algorithms and have a great potential to optimise subsequent analysis steps.
- Optimising the Data Analysis Chain for Specific Criteria: Often the data analysis methods are tuned towards an overall high performance. For specific analyses, other criteria might apply, such as the best-possible angular resolution or the achievement of the highest energies. The aim of this project is to tune the data analysis methods for specific criteria, especially the optimisation of machine learning methods will be important.
- Study of Truncated Shower Images: Depending on the properties of the shower and the observation, the shower image might not be fully contained in the camera. Often these events, which mostly feature the highest energies, are discarded. This project aims at the development of parameters and methods, such as machine learning, to reconstruct these events.
- Studies of the Night Sky Background: The data analysis pipeline is tuned for a nominal level of night sky background. The aim of this project is to study how the standard settings of the pipeline perform under increased levels of night sky background and how the settings can be tuned without changing the default analysis algorithms.
- Studies of Probabilistic Random Forests: Compared to standard Random Forests, probabilistic Random Forests can also consider the uncertainties of the input features and the label. The aim of this projects is to study the application of this new approach to gamma-ray data, e.g. to determine the particle type of the primary incident. As an alternative, probabilistic Random Forests can also be investigated by applying them to search for Active Galactic Nuclei in catalogues of gamma-ray sources.
- Measuring the lateral distribution of air showers with CTA: The high resolution and large field of view of CTA may allow the study of the lateral distribution of the electromagnetic component of energetic air showers. Currently, the observation of cosmic rays relies on high-energy hadronic interaction models. However, it has been shown that these models are failing to reproduce the observations (for example, more muons are observed than predicted by the models). In this project we will simulate air showers and CTA detectors in order to evaluate the possibility of measuring their electromagnetic lateral distribution. If successful, we will compare the measured lateral distributions with the corresponding model predictions. The aim is to shed some light in the modelling of very-high-energy cosmic-ray air showers. This project overlaps with the Pierre Auger Observatory.
Students will gain practical experience with computer languages such as Python, the Linux/Unix operating system, as well as they will have access to high performance computers and machine learning techniques.