Data & HPC

CMM Data & HPC

Director: Jaime San Martín

Researchers: Jorge Amaya, Axel Osses, Daniel RemenikFelipe Tobar
Scientists: Ginés GuerreroFrancisco FörsterJuan Carlos MaureiraAndrew Hart
Engineers: Eduardo CabreraRicardo ContrerasEugenio GuerraCamilo IturraEngels JugoNancy LacourlyFermín MolinaJorge PradoGonzalo Ríos, Gustavo Soto, Paula Uribe

It is commonly accepted that Big data is a concept that refers to several dimensions of the management and use of data and information. This process includes at least the capture and search; transfer, storage, share and security; modeling, analysis and visualization of large and complex data sets. In our Center we work on all of these areas. Although we use what is called Big data techniques in our projects, at CMM we pay particular attention to the mathematical modeling of data.

National Laboratory on HPC. Since the early 90’s we have invested in-house resources to provide us with a suitable computational capacity. This has been tremendously improved in the last years with the creation of the National Laboratory on High Performance Computing (NLHPC). In association with many universities and research centers in Chile the NLHPC gives to the Chilean research community access to HPC capacities to solve computing and data-intensive scientific problems. During these 6 years our team and main cluster Leftraru have served more than 500 users from very diverse scientific areas and geographic zones. Nowadays, Leftraru is fully used and we plan to upgrade it with the inclusion of more computing nodes and with other improvements demanded by our users (high memory nodes and GPUs). Thanks to NLHPC we implemented in Santiago a photonic network between CMM, REUNA, PUC and USACH, which plays a key role in some of our applications. Recently, we have installed a robotic storage for the purposes of backup.

Big Data: main areas at CMM. The main areas at CMM where we use Big data techniques are (percentage refers to the relative size among CMM applications)

  1. Large size Engineering problems (70%), which includes the analysis modeling and simulation of complex problems like: Block caving and fragmentation in mining, scheduling problems in mining and other industries, economy on networks, marketing analytics, smart grids, telecommunications and transport.  The main tools we used on these problems are: Combinatorial Optimization, Multi-criteria Optimization, Stochastic modeling, Online Modeling and Data science.
  2. Bio-informatics (20%) at CMM is focused in developing mathematical strategies to integrate heterogeneous biological data into networks of interaction and methods for the assembly of complex genomes that includes: Salmon, grapes and potatoes, humans. The main tools used are: Information theory, Assembly of Metagenome, Data science and classification, Probabilistic and Statistical modeling, Searching engines, Algorithmic and complexity theory.
  3. Astroinformatics (10%) at CMM is a multidisciplinary team working in relevant challenges in Astronomy, where the analysis of large data sets is at the hart of these problems. Our team is formed from astronomers, mathematicians, computer scientist and engineers. The main tools used are: Classification techniques, Machine learning, Random Forest, Unsupervised learning, Image processing, Time series analysis, Pipeline construction and sophisticated programming techniques.

Some featured projects: