Engineer on Machine Learning for Ocean’s Data Quality Control – Ifremer
OPEN POSITION FOR A 1-YEAR LONG LIMITED TERM CONTRACT Engineer on Machine Learning for Ocean’s Data Quality Control
Laboratory for Ocean Physics and Satellite Remote Sensing Ifremer – Brest
Thousands of ocean measurements of temperature and salinity are collected globally and every day. Controlling the quality of these data is a human resources intensive task because control procedures still produce a lot of false alarms only detected by a human expert. This is because quality control procedures have not yet benefited from the recent development of efficient machine learning methods to predict simple targets from complex multi-dimensional features, such as deep learning. This engineer position is about the development of such methods for oceanography.
The ocean measurements this position will be focusing on are produced by an international network of more than 3000 profiling floats (autonomous probes freely drifting in the deep ocean) that measure temperature and salinity from 2,000 metres to the surface: the Argo network (http://www.argo.ucsd.edu). Argo data are used in real time for operational forecasting and, after a careful scientific quality control, for climate change research and monitoring. Argo is the first-ever global, in-situ ocean-observing network in the history of oceanography, providing an essential complement to satellite systems. Argo delivers critical data (especially over the vertical dimension of the oceans) for climate monitoring and ocean forecasting models. Argo data are essential for climate change research.
Data qualification is one of the main challenge of the observational network. The Argo dataset, which is freely distributed to the scientific community, has to have the required precision for climate change studies. To guarantee this precision, the data are checked in delayed mode to detect and correct possible biases or drift of the sensors. These detections are conducted by human operators with the required scientific expertise and is based on a suite of automatic quality control procedures. To date, about 1.1 millions of Argo profiles (64% of the dataset) have been quality controlled in delayed mode, i.e. for research purposes. This historical work provides an unique, and large, training dataset that should allows for the development of new supervised algorithms to predict the research quality flag of new data. As control procedures do not evolve significantly and mostly rely on comparisons with a climatology, the goal of this project is to develop machine learning prediction methods of the quality flag of Argo data, in real time and delayed mode.
This position is part of the MOCCA project (Monitoring the Oceans and Climate Change with Argo), an European initiative carried by the European Research Infrastructure Consortium EURO-ARGO. The goal of the project is to progress towards the Euro-Argo objectives in monitoring the oceans through procuring and deploying Argo floats as well as ensuring the collection, analysis, management, processing and dissemination of the data. These data are made freely available as a European contribution to the international Argo programme. Within the MOCCA project, the Laboratory for Ocean Physics and Satellite Remote Sensing (LOPS, http://www.umr-lops.fr/en) is in charge of the quality control of the Argo data in the North Atlantic and for the development of new quality control procedures.
MAIN MISSION Within the LOPS at IFREMER, the recruited person will be in charge of developing a new procedure to quality control Argo data. This procedure will be based on machine learning methods and on the historical dataset of measurement quality flags, which provides the training dataset for supervised
learning methods. The recruited person will ultimately transfert the procedure to the Coriolis data center. The recruited person will be working in close collaboration with a “Big Data” engineer at Coriolis who will be in charge of setting-up a software environment for big data mining and machine learning (eg: tensorflow, sparkml).
ACTIVITIES The recruited person will be in charge of:
- Collaborating with the Coriolis “Big Data” engineer to assemble and format training datasets, and their diffusion
- Testing and evaluating existing predictive methods (e.g. ensemble of random forests, neural networks, CNN)
- Developing a new procedure for the prediction of Argo data quality flags, for real time and delayed mode
- Transfering the procedure to Coriolis and to the European and international partners
- Document and diffuse the results of this procedure (reports, codes, oral presentations)
To carry out these activities, the recruited person will have to acquire basic knowledge of the existing Argo data flow and quality control methods.
- Master degree, thesis or diploma from engineering school in data science
- Wished experience : from 0 to 2 years
- Interest for oceanographic science and data
- Programming language : python and possibly matlab
- Ability to read, write and communicate in English
- Rigor and autonomy
EXPECTED STARTING DATE AND DURATION As soon as possible, 1 year position
CONTACT G. Maze / firstname.lastname@example.org / @mazeguillaume Laboratoire d’Océanographie Physique et Spatiale – UMR6523 CNRS/IFREMER/IRD/UBO Ifremer – Centre de Brest ZI de la Pointe du Diable, CS10070, F-29280 Plouzané