Big Data & Data Privacy
Large volumes of data have been continuously generated and collected through social interaction applications or even by captors associated with individuals and equipment. The characteristics of volume and speed generation create storage challenges and massive data processing. LSBD not only develops data management techniques, particularly in cloud computing infrastructure, but also proposes algorithms for its distributed and / or parallel processing based on free platforms such as Hadoop and Spark.
Data from individuals collected via captors, retrieved from social networks or even published by government organizations may reveal, individually or in association with other data sources, sensitive information about individuals. LSBD researches and develops data anonymization techniques to enable data collection and publication while still ensuring a certain level of privacy for the individuals about whom data provides information. Modern data privacy strategies, such as differential privacy, are researched by LSBD researchers.
Database adaptability is defined as the ability of the database management system (DBMS) to make incremental decisions on relaying the components of the database, based upon characteristics such as attributes accessed, data retrieved, executed queries, without impairing the capacity to answer queries.
Adaptability can be applied to various components of the database, such as in storage, where data can be organized to maximize the utility of data retrieved in higher memory hierarchies; indexes, which can be built incrementally and on demand without interruption of database service for its construction; among other components.
Adaptive databases are especially useful in situations where the workload is unknown and data must be processed quickly, as in the data mining scenario, where new data is obtained daily and processed by the next day. This context does not allow wasting time on constructing indexes without obtaining information and answers to queries.
Currently, LSBD Adaptivity Cell has been directing its efforts towards adaptive storage, adaptive indexes, adaptability with new hardware and detection of patterns in workload.
Cloud Computing is becoming one of the keywords when it comes to Information Technology (IT) industry. The computational cloud is a metaphor for the Internet or communication infrastructure among architectural components, representing an abstraction that hides the complexity of the infrastructure supporting the execution of applications. Each part of this infrastructure is provided as a service and these are usually allocated in data centers, using shared hardware for computing and storage.
Currently, this paradigm is widely used to provide scalable services in a transparent way. Some of the services offered by cloud providers are:
- Software as a Service (SaaS), which provides software systems with specific purposes that are made available for users through the Internet;
- Platform as a Service (PaaS), which provides operating system, programming languages and development environments for applications;
- Infrastructure as a Service (IaaS),which makes it easier and more affordable to provide resources such as servers, network, storage, and other critical computing resources to build an application deployment environment.
ML + Predictive
Machine Learning is a field that entails statistical, probabilistic, computer, and algorithmic aspects resulting from both iterative learning of data and the discovery of hidden information that can be used to reveal new knowledge or to create intelligent applications. LSBD has a remarkable history of scientific work in this area as well as collaborations with industry partners, with applications of ML techniques in their problems.
Either supervised or unsupervised techniques are researched by the LSBD ML research group, composed by more than six researchers. Techniques such as Clustering, Forecasting, Distribution Estimation, Support Vector Machines, Logistic Regression, Neural Networks, Recurrent Neural Networks, Decision tree, JSSP, among others are part of the research team’s expertise and can be applied to solve complex problems.
Algorithms are designed to diagnose computer component failures. They detect / identify many hardware component data present on multiple computers in use. These data hold historical information about the state of equipment components collected whenever diagnostic programs are executed in the computers.
Statistical analysis on this data volume may reveal the characteristics of the equipment that generate frequent failures or allow the identification of association relationships among failures. Thus, in both cases, they facilitate automated software maintenance when possible, or generate alerts so that the user can perform maintenance by exchanging components with predicted imminent failure.
LSBD has developed device drivers for UEFI platform, including services related to storage and data transfer. Examples of such services include reading and writing file systems other than FAT, supported by platform standard, and protocols for accessing remote and mobile file systems. LSBD has also developed research in quality and software testing and pioneering software diagnostic applications in the context of UEFI firmwares.
UEFI is the acronym for Unified and Extensible Firmware Interface. It means that this component is responsible for initializing the main devices of a computer and presenting its services to the software installed on it. Despite the definition, UEFI firmware can be used for several tasks besides booting the system.
This is due to the fact that such firmwares provide a collection of functionalities for the development of applications and drivers in the pre-operating system execution environment, such as TCP / IP transfers, file system access, remote boot and even application protocols such as HTTP, all in high-level programming language. Therefore, UEFI platform has become more and more attractive for the creation of diagnostic, security and monitoring systems which rely more on direct access to hardware or less on operating systems.
LSBD researchers have proposed using the tools of reflection seismic interpretation 2D, 3D and well profiling data to expand the knowledge on the structural and stratigraphic fields of Ceará Basin, Brazilian Equatorial Margin. The interpretation of seismic data is done by using Schlumberger’s Petrel software.