The strange world of coding

Photo by Priscilla Du Preez on Unsplash

The good news is that I am still on holiday. These days I am playing with some python code to read data from some of our systems and perform some analysis before the new fiscal year officially starts.

It started as a small project and a relaxing activity. It was not strictly work; I was playing with data and python. I love coding, and I always do something when I have some free time.

I ended up with five thousand lines of code.

This specific exercise was an ETL (Extract, Transform, Load). Two main systems were involved: Salesforce and Google Drive. I wanted everything on a Sqlite database to wrangle data with pandas and NumPy. 

Extracting data was easy. Both Salesforce and Google Drive have very well-documented APIs. The transformation was tricky. Every system has its way of representing data. Specifically, date and time management is always a massive pain between different systems. The load was a breeze.

Finally, I made it. My database was loaded with data.

I ran the tests against data integrity, and something was wrong. After a SQL join, I expected 3036 rows and got 3108. It took me an hour to find the culprits: I forgot to disallow duplicates in a database field, and there was duplicated data in Google Drive.

I could have deleted the duplicated data in Google Drive, but I did not want to do it. I am not the only one accessing and using that data.

I modified my code to cope with duplicated data. Well, it almost doubled the size of my codebase. I could not simply discard the duplicate. I had to merge the data in a table with 32 different fields. Each field with specific requirements.

It was exciting and intriguing.

Sometimes you spend more time coding edge case management than the core application logic.

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Commenti
Newest
Oldest Most Voted
Inline Feedbacks
View all comments