A new era in synthetic data generation in aviation is on the horizon with the launch of the SynthAIr project. As Massimiliano Ruocco, coordinator, explains, the project is a response to the scarcity of relevant data for aviation and the inherent limitations of AI models in handling diverse datasets.
What is the rationale of the SynthAIr project and what are its main objectives?
The SynthAIr project aims to enhance the automation of air traffic management (ATM) systems through innovative AI-based methods for synthetic data generation. Its main objectives include overcoming challenges like data access, scarcity, privacy issues, and bias in data, and leveraging synthetic data to improve efficiency, robustness, and resilience in AI adoption. The project seeks to develop high-fidelity, diverse, privacy-preserving synthetic data generation methods and validate its impact through various operational use cases such as turnaround time, flight delay, and passenger flow prediction.
What is synthetic data and why is it needed? Why is it so important in automation?
Synthetic data is artificially generated data that mirrors real-world data, offering a powerful, privacy-compliant tool for training AI systems. Essential in automation, it enriches data diversity, bypasses privacy issues, and accelerates AI development.
What risks are there in using synthetic data and how can they be derisked?
The risks associated with synthetic data do often reflect its benefits. Among them I believe the most relevant are those related to 1) bias and privacy issues and 2) validation challenges and overfitting. In the first case, synthetic data can inadvertently replicate biases from the original dataset, leading to skewed AI models. It may also risk leaking sensitive information. To tackle this, implementing privacy-preserving techniques and rigorous validation checks are crucial to ensure that synthetic data does not contain sensitive information or inherent biases. In the second case, ensuring the accuracy and reliability of synthetic data is challenging, as it may not fully capture real-world complexity. Overfitting is another concern, where models excessively tailor to synthetic data and perform poorly on real data. Employing robust validation strategies can help mitigate these risks.
Are there other industries that are reliant on synthetic data?
Yes. First of all, some reports indicate that due to all data access issues as well the need of data in the age of AI, in the near future there could be a prevalence of synthetic data over real data in training AI models. Various industries increasingly rely on synthetic data, particularly where data privacy is a concern or where there is a need of better data quality. For instance, in healthcare, patient data is heavily regulated, making synthetic data an ideal alternative for training AI model, or in finance and banking where synthetic data is used for developing better fraud detection systems. Another area is urban development and fighting poverty. Recently, a plan of the city of New York used synthetic population data to explore and understand poverty rates at a more granular level. This method allows for more targeted social programmes by identifying specific areas with higher poverty levels, all while maintaining individual privacy.
How is this project building on the work of previous SESAR innovation projects?
SynthAIr aligns with the trajectory set by previous exploratory research projects in the area of AI, while at the same time coming with novel elements. In addition to drawing inspiration from previous SESAR innovation projects, SynthAIr actively seeks to establish connections and foster collaborations with ongoing SESAR projects. This strategic alignment aims to identify synergies and maximize the impact of SynthAIr’s outcomes. By engaging with current projects and stakeholders within the SESAR 3 framework, SynthAIr not only builds upon the legacy of past research but also ensures its findings are complementary and integral to the evolving landscape of ATM innovation. This approach enhances the potential of SynthAIr to contribute significantly to the broader objectives of the Digital European Sky programme, particularly in utilising AI for higher levels of automation in ATM systems.