Analyzing the punctuality of flights using BlazingSQL and RAPIDS

Source: https://www.pexels.com

The COVID-19 pandemic impacted our lives in many ways: some of us (lucky ones) started working from home full time, we started social distancing and wearing masks, some of us have not seen our families in a long time, missed friends’ weddings… The list goes on and on. But just like we got impacted, so did numerous businesses and their employees around the world. One of the most heavily hit was the hospitality industry: hotels, restaurants, and airlines (among others).

If you look at the TSA checkpoint travel numbers you can notice a sudden drop in the number of passengers…


Introducing BlazingSQL on Blazing Notebooks

Source: www.pexels.com

ETL (or Extract Transform Load) is a backbone of a successful data warehousing solution in any enterprise. It fuels many business intelligence, data science, and machine learning applications down the line and accounts for a vast majority of the development and processing time.

Historically, ETL was a domain of large databases, like Microsoft SQL Server, Oracle, or Teradata. With the advent of Hadoop-derived projects like Hive or Spark, the warehousing landscape evolved and became more distributed and resilient while gaining the ability to process the ever-growing streams of data at reasonable times. However, the data volume grows every day: companies…


When the pandemic hit and the whole world virtually shut down I, like many others, ended up working from home. I was the lucky one — still had a job and the ability to do it from home and for that I am grateful! Two things, however, made this transition much less painful work-wise and both came courtesy of Lenovo.

Full disclosure: I am not sponsored nor paid by Lenovo in any way to write this article. I simply decided that I wanted to share my experience working on two systems that Lenovo makes and was kind enough to lend…


Part 1: Getting familiar with the data

At the time I publish this, we are entering the 9th month since COVID-19 froze the world. Since early January we all experienced it differently: some of us were lucky and got locked up in our houses, able to do our work remotely and live relatively unchanged lives, some of us were left without such luxury, some, sadly, passed away.

Source: https://unsplash.com/photos/EAgGqOiDDMg/download?force=true

Coronaviruses have been around for decades but few to none such deadly and easily spreading as the COVID-19 thus far. Earlier this year, Allen Institute for AI (AI2) and a consortium of research institutes along with the White House curated…


Image credit: https://upload.wikimedia.org/wikipedia/commons/thumb/8/82/SARS-CoV-2_without_background.png/1200px-SARS-CoV-2_without_background.png

In the middle of COVID-19 pandemic, Allen Institute for AI in collaboration with the White House and other parties published a competition on Kaggle which they prepared a corpus of over 167k scientific paper (of which 79k were provided with full text) on coronaviruses for. Along with the data, 17 tasks were posted; the organizers asked the competitors to probe the scientific corpus and answer questions (among many other) on incubation periods of various coronaviruses (COVID-19, SARS, MARS etc.), …


Using RAPIDS to find more accurate walking distance

What’s wrong with this image?

In the previous story, we explored the cuSpatial and cuGraph libraries from NVIDIA RAPIDS package and used them to find the walking distance to nearest parking points near Seattle’s Space Needle. This was only possible with great help from John Murray of Fusion Data Science who kindly provided a graph of King County roads in a form of a list of intersections (with geo coordinates) and list of edges linking the intersections with calculated length (in yards).

However, a quick look at the map above quickly reveals a problem with our relatively simple approach presented before: the distance from the…


Using new tools from NVIDIA RAPIDS to determine the shortest walking distance to a parking spot

Introduction

In the previous story, we’ve explored the Paid Parking Occupancy dataset provided by the City of Seattle Department of Transportation. You could see (and, hopefully, test) how quickly all the calculations on this data can be done using NVIDIA RAPIDS.

Just to recap: the data we used can be downloaded here. It is a subset of the full dataset that has been published since the beginning of 2019 and includes all the transactions for May and June. The dataset is around 7GB in size and fits nicely on the NVIDIA Titan RTX with 24GB of VRAM; to use this dataset…


Using NVIDIA RAPIDS to mine Seattle Parking Data

“Seattle from Kerry Park” by Tom Drabas

Driving in Seattle is quickly becoming very similar to driving in cities like San Francisco, Silicon Valley or Los Angeles: more and more companies choose to settle or open their offices in Seattle so they can tap into the tech community that Seattle has to offer. With that, parking in Seattle is getting harder by day.

Paid Parking Occupancy dataset provided by the City of Seattle Department of Transportation provides a view into around 300 million parking transactions annually from around 12 thousands parking spots on roughly 1,500 block faces. The dataset does not include any transaction for Sundays as…

Tom Drabas

Data scientist, math lover, computer geek, tube-amps designer and builder, die-hard TOOL fan. Working for Blazing SQL. Ex Microsoft.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store