Just the Facts: Extracting Only What You Need To Avoid Unecessary Work

This is a premium course, which you can purchase below.

Our inspection showed us how we can isolate the Enceladus data - now let's do it!

We'll start by creating a script to transform the imported data into something we can analyze later on. For that, we'll use bash:

mkdir enceladus #if you haven't already
touch enceladus/transform.sql

Once again, we'll create a schema to work in, but this time we'll ensure the script is idempotent from the start. As a convenience, we'll tell Postgres that every bit of SQL that's to come is to be run in the enceladus schema. We can do that by setting the search_path:

drop schema if exists enceladus cascade;
create schema enceladus;
set search_path='enceladus';

Creating our teams and plans table, which are related by a foreign key, which is the team_id in the plans table. It references the id of the teams table. We specify that using the keyword references:

drop schema if exists enceladus cascade;
create schema enceladus;
set search_path='enceladus';

create table teams(
  id serial primary key,
  name text not null
);

create table plans(
  id serial primary key,
  start timestamp not null,
  title text not null,
  team_id int not null references teams(id),
  description text
);

Now we need to add some data and we can do that using a new SQL friend, insert. You can insert data in two ways: using the results of a query or by creating a set of hard-coded values. We'll use a query:

insert into teams(name)
select distinct team from csvs.master_plan;

insert into plans(start, title, team_id, description)
select
  start_time_utc::timestamp,
  title,
  (select id from teams where name=csvs.master_plan.team),
  description
from csvs.master_plan
where target='Enceladus' and title is not null;