Online Data Collection

online data collectionWe all have heard the saying that "one man's trash is another man's treasure". Well, the modern version of this saying is the data exhaust. Data exhaust is the data generated as a byproduct of people's online actions and choices. Data exhaust consists of the various files generated by web browsers and their plug-ins such as cookies, log files, temporary internet files and .sol files (flash cookies). The main question is how can it be useful? The first wave of data can be fitted easily into the normal database tables and specific areas such as estimates, prices, fundamentals, etc. The second wave of quant data came as a transition from numerical data towards text that is not structured. That is the time when text mining became popular and available. Some modern technologies give their users the power to analyze millions of documents and articles in a short period of time. The third wave is the data exhaust – mining an unexplored area of huge data content to get a potential edge. This can be very helpful and beneficial when doing online marketing.

The following are some things you should have in mind and know about data exhaust. First, you should always ask to get anonymous data. This is because most of the people are concerned about the way their data is used. Quants are not interested in what the individuals are doing, but they want to learn about the behavior that is in relation to the macro or company level. Anonymous data is in the best interest for anyone involved.

Second, think about and take into consideration the stakeholders in your potential partner company. Credit card companies have two major stakeholders – merchants that accept credit cards and credit card holders. If you are a credit card holder, you probably do not care what the credit card company does with the anonymous purchase data. However, a large merchant would not be pleased if the credit card company gives the transaction data to some financial investors. Explore and check these concerns early.

Thirdly, think outside-of-box regarding the proxies. Think of the data types that would be proxy. One example is the GPS data which is pretty helpful and precise in telling you where and how long someone has been somewhere. If a person stays for longer periods in a restaurant or a shop, then it is most likely that the person is buying something.

Fourth, be prepared for lots of cleaning work. There are big chances that your potential partner has not thought of monetizing the data exhaust when their companies and businesses were initially established. That is why you should not expect that this type of work to come to you without problems.

Fifth, data exhaust that does not have enough history may not look particularly useful. Sad side-effect for data exhaust is that it can often come with limited or no history. This is because either the company has recently started to retain the data exhaust or the potential partner has just begun to work. Be careful, because with limited data exhaust history you have no idea how will that data perform during many different market regimes.