×
Big Data
30% of the data we have access to is being generated in real time, are we over-informed? Is there more data available than we can consume? I don't think so, I explain it below.
➡️ Today we live in the Big Data era. Without data, artificial intelligence could not have experienced the acceleration we have witnessed. In fact, much of the algorithms and artificial intelligence techniques were developed decades ago, but only now are they bearing fruit because they are leveraging Big Data.
📈 The growth of data has been exponential, with 90% of data created in the last two years. For next year it is estimated that we will reach 175 zettabytes; to put it in perspective, imagine we represent the Digital Universe with 128GB tablets like the ones we have at home. Well, 175 zettabytes are equivalent to 25 columns of those tablets stacked, each column with a height equivalent to the distance from Earth to the Moon, which is 384,400 km.
📖 So, how can we consume all this information? What works best for me is following some guidelines:
1️⃣ Trying to extract data that has value: keep in mind that it is estimated that only 0.5% of data is currently being analyzed. We have a long way to go, but data quality is also important, that the data is truthful. In the financial world it is also relevant that it is point-in-time; that is, that we use the data that was available at each moment in time when backtesting our models (for example, inflation or GDP series are published with a delay and in many cases are revised; that revised series I didn't have in the past, so I couldn't have made a decision based on it)
2️⃣ Expanding the spectrum of data: we tend to use only structured data, but only about 20% of it is (Excels to understand), with the rest being either semi-structured (HTML) or unstructured (social networks, satellite images, etc.). It's true that in many cases structured data is what works, but resorting to, for example, the sentiment that exists in social networks is usually very useful and allows you to anticipate trends
3️⃣ Using new tools: there is a tendency to use linear models as prediction tools (linear regression, for example), but the reality is that relationships in financial data are usually not linear. Linear models in this sense are not useful in many cases because they can be interpreted better, but when the relationships are not linear it is important to use models that don't force them.
✅ By way of conclusion, the Big Data era has been a great milestone and has allowed many of the artificial intelligence techniques that were envisioned decades ago to become reality today. Knowing how to make use of all this amount of data is important, but you must follow a method to avoid falling into the "garbage in garbage out" trap (no matter how good our model is, if the input data doesn't have enough quality our output won't be useful).