Don’t expect to find data of Switzerland Email Database quality, but be aware that every missing attribute can come at an additional cost. For example, if you want to exploit data that suffers from inconsistency, you will have to put in place a process to resolve these inconsistencies or choose to tolerate them with risks to be assessed on a case-by-case basis. Here is a selection of the sites I use most often to research data. Most offer free data without paying. It is neither an exhaustive list nor an ordered list. There are others and I invite you to leave a comment if you want to share your tips!

My first instinct is to use Google Dataset Search (2) when I’m looking for a dataset. This is an engine that lists thousands of repositories and allows you to perform keyword research. It offers several filters as of this writing (August 2021), including update frequency and format. On the downside, the biggest is the perimeter that the engine covers. There is indeed no selection on the quantity or quality of the data available, for example. Kaggle (3) is a web platform that organizes data science competitions. Companies propose problems to be solved, sometimes for a fee, and make data available to do so.

Where to find quality data

Data is often anonymized, but not systematically. The data is of a much higher quality, but will not be updated. They can therefore be used to build an initial database, be used for training or even to train your own machine learning models. Here too you can do a keyword search. The datasets are also listed on Google Dataset Search. (4) is a platform for disseminating public data from the French State. Here you will find data in tabular or geographic format most often. Here too, it is possible to find the formats, the producers (ministry, etc.), and the geographical areas that interest us.


These data have been partly recorded since 2021 on the site (5) along with those of other EU member countries and EU institutions. For now, we find geographic data, by topic and with filtering on the quality of metadata! For those who wish to create applications using data provided by public services, (6) is a gold mine that is growing steadily. You can find everything there, from the SIRENE API to public transport schedules, including Chorus Pro. This is the best way to automate some of your processes by going directly to the public services that offer APIs.

Google Dataset Search

INSEE (7) regularly produces, publishes and analyzes official statistics in France. In my opinion, the data produced is more difficult to use in a recurring and automated process if we do not go through one of the 4 APIs (of which the SIRENE API is part). Despite everything, the data produced makes it possible to enrich its analyzes which are based on economic activity (targeting of customers, evolution of the level of income or activity, etc.) I want to conclude by sharing a few examples of data found that has been useful in the past or for which I think there is potential for reuse among our customers.

Sometimes there are arguments but we solve the problems like a family and it always goes well. With my associates, we are very close and have a real relationship of trust; we are rather serene and confident! We have nevertheless set up a shareholders’ agreement as in all companies … What makes it work well is that we try to keep things right in our professional and personal relationships. And above all we have three different and complementary areas of expertise; the three of us are very curious and think in different ways which makes it possible to make things happen without stepping on each other’s toes.

Leave a Reply

Your email address will not be published.