The strange world of coding

The good news is that I am still on holiday. These days I am playing with some python code to read data from some of our systems and perform some analysis before the new fiscal year officially starts.

It started as a small project and a relaxing activity. It was not strictly work; I was playing with data and python. I love coding, and I always do something when I have some free time.

I ended up with five thousand lines of code.

This specific exercise was an ETL (Extract, Transform, Load). Two main systems were involved: Salesforce and Google Drive. I wanted everything on a Sqlite database to wrangle data with pandas and NumPy.

Extracting data was easy. Both Salesforce and Google Drive have very well-documented APIs. The transformation was tricky. Every system has its way of representing data. Specifically, date and time management is always a massive pain between different systems. The load was a breeze.

Finally, I made it. My database was loaded with data.

I ran the tests against data integrity, and something was wrong. After a SQL join, I expected 3036 rows and got 3108. It took me an hour to find the culprits: I forgot to disallow duplicates in a database field, and there was duplicated data in Google Drive.

I could have deleted the duplicated data in Google Drive, but I did not want to do it. I am not the only one accessing and using that data.

I modified my code to cope with duplicated data. Well, it almost doubled the size of my codebase. I could not simply discard the duplicate. I had to merge the data in a table with 32 different fields. Each field with specific requirements.

It was exciting and intriguing.

Sometimes you spend more time coding edge case management than the core application logic.

Condividi, se vuoi...