sábado, 25 de abril de 2020

Colombia Covid19 Pipeline and Datasets

Hi everybody,

In this post, I want to share my auto-learning process about data science, since two weeks ago I have been working in the Covid19 dataset from Colombia, and now I have a dataset that I want to share with the open community.

The Project: Colombia Covid19 Pipeline

Pipeline to get data sources from Instituto Nacional de Salud - INS related to Covid19 cases daily report in Colombia to create datasets.


The number of new cases is increasing day by day around the world. This dataset has information about reported cases from 32 Colombia departments.

Here you can find the result from my auto-learning process about data science, this dataset has a daily report from Instituto Nacional de Salud - INS about Covid19 cases reported in Colombia, also has a history report from Instituto Nacional de Salud - INS about Covid19 Samples processed in Colombia.


This dataset uses the INS Covid19 report data source, I did clean the data source and fill the NaN values to generate this dataset with additional attributes like, day of the week, year, and month of the year.

covid19co.csv -> Daily report, Cases reported in Colombia

covid19co_samples_processed.csv -> Daily report, Samples processed in Colombia

This dataset is updated from an automatic pipeline, you can find the GitHub code repository here: Colombia Covid19 Pipeline


Dataset is obtained from Instituto Nacional de Salud - INS daily report Covid19 in Colombia. You can get the official dataset here: INS - Official Report


What questions do you want to see answered?

You can view and collaborate with the analysis here: colombia_covid_19_analysis Kaggle Notebook Kernel.

Work in progress ...

1 comentario:

  1. Hey there, I noticed you said this project has been a learning approach to data science and that you have not been working on it for too long either. What resources have you used to familiarize yourself with the analysis of data?