Committing to documentation can change your (work) life: The GitLab data team’s approach - Emilie Schario
When we talk about documentation there is an inevitable groan. In this talk, Emilie of GitLab will share how the team’s commitment to documentation has supported heacount 5x-ing in less than a year. She’ll share three principles you can apply to documentation that can help set your team up for success. Emilie was the first Data Analyst at GitLab and has a built a career around being the first data analyst at growth-oriented startups. She is an Army Wife and Princeton University and Venture for America alumna. She currently lives in Savannah, GA.
Every change to the web app creates a review app and quality tests. On data teams, though, you verbalize the logic, cross your fingers, and hope it works. Why are the standards for review for data teams not the same as developers? At GitLab, we’re adopting DataOps, applying the best practices of the DevOps lifecycle to data, furthering the premise the analytics is a subfield of software engineering. In this presentation, I will share the merge-request-first workflow we’ve adopted at GitLab, and its effects on the business. The entire analytics stack, from ELT to visualization, is version controlled. Any changes are done in merge requests for testability and accountability. Every merge request has its own clone of the data warehouse so that there are no discrepancies between development and production results. Through open source tool dbt, all transformations in the data warehouse are version controlled, and documentation is created and stored. All of the ELT jobs, tests, and builds are orchestrated by GitLab CI and Airflow. Utilizing these processes has enabled a 3 person data team to support the data needs of a billion dollar company.
Deploying your first dbt project with GitLab CI
In this live-coding workshop, Emilie will deploy a dbt project using GitLab CI. Open source dbt is the T in ELT and is used to organize, cleanse, denormalize, filter, rename, and pre-aggregate the raw data in your warehouse so that it’s ready for analysis. In this tutorial, we will go from
run --target prod in less than an hour, empowering attendees working in data to understand the ease of implementing a new dbt project. Along the way, we’ll talk about the benefits of ELT over ETL and why SQL is the answer to all of your transformation problems. Finally, we’ll use open source GitLab CI to deploy the project
Online Data Privacy in a Facebook and Google Internet
Facebook ads today can only be described as creepy. The amount of data tech giants are collecting on us is growing at higher rates than most people imagine. In this tech-light talk, I’ll highlight how individuals should consider the business models of the online platforms they’re interacting with before deciding to be a user and what examples of personal data are being collected. I’ll leave the audience with four specific actions they can take to better secure their data online:
- Changing their browser
- Changing their search tool
- Avoiding mobile apps
- Using Ghostery to understand web tracking
- Using a password manager
Scrapbooking with Data
Scrapbooking has long been a popular form of memory keeping. For those who prefer other forms of creativity, I propose an alternative- scrapbooking with data. By leveraging auto-collected data, we can indulge in traditional memory keeping without the construction paper and glue. Applying the principles of data analysis to our individual lives, we can strengthen and hone skills related to making data-driven decisions. Using personal data, I will demonstrate each of the four key takeaways.
Data is often incomplete. If your Apple Watch battery dies, your workout data will be incomplete.
A data point is rarely enough to make a decision off; instead, you need at least a trend to make a decision. It’s not enough to know you read 1 book in February. You need to know how many books you read in December, January, and February to make a decision off that data.
Data is most powerful when enriched with domain expertise. When looking at weight data, we just see spikes, but knowing what was going on around a period of time by enriching it with domain expertise creates a more whole picture.
By the time you want a data point, it’s too late to collect it. We should proactively be collecting data in our lives (for privacy concerns, consider who owns that data) so that we can make informed decisions when we need to
Understanding Data in Your Life
Data privacy is only the newest frontier for life in the technological age. In this presentation, Emilie will talk about the many way we’re interacting with data in our day-to-day lives. Emilie will start by describing the state of online web tracking and data collection. She will share best practices to be aware of and small changes you can make to be more secure online. Finally, she will talk about the ways we can use the data in our lives for our own benefit. From Fitbits to RescueTime, data collection can be a force for good. We will use Emilie’s health and wellness data as a demonstration throughout the presentation.