SQL

Structured Query Language (SQL) is a language for storing, manipulating, and retrieving data in databases

created: 2021-01-20 · modified: 2021-02-09

Basic Queries

Select: get values from database

// general select query
SELECT column1, column2
FROM table_name;

// select all columns
SELECT * FROM table_name;

// list only distinct values
SELECT DISTINCT column1, column2
FROM table_name;

// order results by column
SELECT column1, column2
FROM table_name
ORDER BY column2, column1 ASC|DESC;

// select with conditional where clause
SELECT column1, column2
FROM table_name
WHERE condition;

// select null/non-null values
SELECT column1, column2
FROM table_name
WHERE column1 IS (NOT) NULL;

// select top n rows from table
SELECT TOP n column1, column2
FROM table_name
WHERE condition;

// select top p percent of rows from table
SELECT TOP p PERCENT column1, column2
FROM table_name
WHERE condition;

// alternative to top for PSQL & MySQL
SELECT column1, column2
FROM table_name
WHERE condition
LIMIT n;

// min and max values 
SELECT MIN(column1) as Minimum
FROM table_name;

SELECT MAX(column1) as Maximum
FROM table_name;

// count num of records matching condition
SELECT COUNT(column1)
FROM table_name
WHERE condition

// avg value of records matching condition
SELECT AVG(column1)
FROM table_name
WHERE condition

// sum of values of records matching condition
SELECT SUM(column1)
FROM table_name
WHERE condition

// match records like a pattern, where pattern matches below
SELECT column1, column2
FROM table_name
WHERE columnN LIKE pattern

// IN is short for multiple OR operations
SELECT column1, column2
FROM table_name
WHERE column2 IN (value1, value2);

// IN can also be used with SELECT
SELECT column1, column2
FROM table_name
WHERE column2 IN (SELECT * FROM other_table);

// return values BETWEEN inclusive range
SELECT column1, column2
FROM table_name
WHERE column2 BETWEEN value1 AND value2;

// aliases, call columns by specified name
SELECT column1 as col1, column2 as col2
FROM table_name
WHERE condition;

// can also use aliases for tables
SELECT a.col1, a.col2, b.col1
FROM table_a as a, table_b as b
WHERE a.prop = 'prop';

// this shortens standard dot notation
SELECT table_a.col1, table_a.col2, table_b.col1
FROM table_a, table_b
WHERE table_a.prop = 'prop';

// joins are used to combine two tables based on a common column
// standard JOIN syntax
SELECT column1, column2
FROM table1
<JOIN_TYPE> JOIN table2 ON table1.common_col = table2.common_col;

// INNER JOIN is like intersection; joins on common values across both tables
SELECT column1, column2
FROM table1
INNER JOIN table2 ON table1.common_col = table2.common_col;

// LEFT JOIN matches all records from left table, and merges with corresponding values from right table
SELECT column1, column2
FROM table1
LEFT JOIN table2 ON table1.common_col = table2.common_col;

// RIGHT JOIN matches all records from right table, and merges with corresponding values from left table
SELECT column1, column2
FROM table1
RIGHT JOIN table2 ON table1.common_col = table2.common_col;

// OUTER JOIN is like union; joins on any matching records in either table
SELECT column1, column2
FROM table1
FULL OUTER JOIN table2 ON table1.common_col = table2.common_col;

// SELF JOIN joins a table with itself as if it were two; used to perform relative comparisons
SELECT A.column1, A.column2
FROM table1 A, table1 B
WHERE A.col1 < B.col1;

// UNION used to combine results from multiple select statements
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2

// by default, union only selects distinct values; use ALL to allow duplicates
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2

// group by

// concat function

Update: change values already in table

// conditional update
UPDATE table_name
SET column1 = value1, column2 = value2
WHERE condition;

// omitted WHERE clause updates all records
UPDATE table_name
SET column1 = value1, column2 = value2;

Insert: add new records to table

// general insert query
INSERT INTO table_name (column1, column2)
VALUES (value1, value2)

// insert values for all columns
INSERT INTO table_name
VALUES (value1, ..., valuen)

Delete: delete existing records in table

// conditional delete statement
DELETE FROM table_name
WHERE condition;

// delete all records from table
DELETE FROM table_name;

// delete all records quickly for large tables
// it effectively drops and recreates the table
// instead of deleting rows. This frees storage space
// immediately, which can be critical in some cases (ahem).
TRUNCATE TABLE table_name;

// describe table schema
DESCRIBE table_name

// change column type
ALTER TABLE table_name
    ALTER COLUMN column1 TYPE newtype,
    ALTER COLUMN columnN TYPE newtype,

// rename column
ALTER TABLE table_name
RENAME COLUMN col1 TO col2

// add column after table creation
ALTER TABLE table_name
ADD column1 datatype

// add primary key after table creation
ALTER TABLE table_name
ADD PRIMARY KEY (column1, ..., columnN)

// rename table
ALTER TABLE table_name
RENAME TO new_table_name

// add/remove not null constraint
ALTER TABLE table_name 
ALTER COLUMN column1 SET [DROP] NOT NULL;

Standards

// operators for conditional statements
Arithmetic Operators
+ Addition
- Subtrsction
* Multiplication
/ Division
% Modulus

Comparison Operators (for WHERE conditions)
= check if values are equal
!= check if values arent equal
>, < greater than, less than comparisons
>=, <= greater than or equal, less than or equal

- BETWEEN: between an inclusive range
- LIKE: search for a pattern, where pattern can have wildcards
  - "%": represents 0, 1, or more characters
  - "_": represents 1 character
- IN: specify multiple possible values for column

Joining WHERE Conditions
AND 
OR
NOT

Logical Operators
ALL
AND
ANY
BETWEEN
EXISTS
IN
LIKE
NOT
OR
IS NULL
UNIQUE

Built-in Datatypes

Database Setup and Management

// create new database
CREATE DATABASE dbname

// drop/delete database
DROP DATABASE dbname

// select database for use
USE dbname

// create table within database
CREATE TABLE table_name(
    column1 datatype NOT NULL,
    column2 datatype,
    ...
    columnN datatype,
    PRIMARY KEY (column1, column2)
)

// create table within foreign key
CREATE TABLE table_name(
    column1 datatype NOT NULL,
    column2 datatype,
    ...
    columnN datatype,
    PRIMARY KEY (column1, column2)
    FOREIGN KEY (column3, column4) REFERENCES other_table (column3, column4)
        ON UPDATE CASCADE
)

// describe table
DESC table_name

// drop/delete table
DROP TABLE table_name

PostgreSQL

Setup

# install on ubunutu
$ sudo apt-get install postgresql

# installation creates a postgres user account
# enter interactive postgres session
$ sudo -u postgres psql

# the 'sudo -u postgres cmd' executes 'cmd' as user postgres.
$ sudo -i -u postgres
# will switch the account to postgres, and you can simply use 'psql' from there on

# interactively set up new user
$ sudo -u postgres createuser --interactive

# create corresponding database
$ sudo -u postgres createdb <newuser>

# by default, typing 'psql' will try to use a user with same name as linux user
# the pg_hba.conf also restricts using psql to user based authentication only, 
# meaning psql asks the OS for a linux user login name and uses that as authentication.
# change this by changing 'peer' to 'trust' in '/etc/postgresql/9.5/main/pg_hba.conf'
# for the first two uncommented config lines.
# then, restart postgres server
$ sudo service postgresql restart

# you can now access psql with the new user by calling
$ psql -U <newuser>
# note how you no longer have to execute commands from the postgres user

# login with user postgres:
$ psql -U postgres

# set password for new user or for postgres
postgres$ ALTER USER postgres WITH PASSWORD 'new_pass';

# now that user passwords are set, go back to pg_hba.conf and change 
# 'trust' to 'md5' to allow passwords for psql user login
# use psql (from any linux account) under new user with
$ psql -U <newuser>
# where you will prompted with the user's password. This will connect to
# the database sharing the name 'newuser'

Usage

# OTHER COMMANDS
$ psql -h <myhost> -p <port> -U <user> -d <dbname>
# use this to be explicit about the server, port, user, and database you're connecting to 

# gives info on current connection
psql$ \conninfo

# list databases
psql$ \l

# show all tables in database
psql$ \dt

# describe a table
psql$ \d <tablename>

# switch to database
psql$ \c <db-name>

# setting up a tablespace, a defined location for databases/tables to be stored,
# can be helpful if you need to store a particular db on a different partition
CREATE TABLESPACE <name> LOCATION '<path>';
# an existing db can then be set to this location
ALTER DATABASE <dbname> SET TABLESPACE <name>;

# see absolute table sizes on disk
SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 5;

# restoring/creating a table in a .sql written with pg_dump
# note the "-1" stands for "single transaction"
# also note the table name will be the same as before; to avoid conflicts
# consider using a new test database and make sure the tablespace is correctly set
psql -U <user> -d <database> -1 -f /path/to/file.sql

Errors

No Disk Space

The main errors I’ve encountered when working with Postgres have occurred when I no longer have any space left on the system (mainly due to the database itself). I will then try to clear up some rows by running a delete command (if Postgres can even start), and will get a “PANIC” error and the server immediately shuts down, and can’t restart. This is mostly due to the fact that there is not space to write temp files during deletion, and then when restarting the server Postgres doesn’t have enough space to write to the important pg_xlog directory (see here for more details). In this case, Postgres won’t even be able to start for you to actually delete more rows. So far, the only solution I’ve seen to this problem is to attempt to free up just enough space elsewhere on the machine to allow Postgres to be able to write files to this directory and actually startup. So to address this problem, clear up some space (a few hundred MB if possible, the equivalent of a Python venv) and restart Postgres.

Backing up a table directly another machine

When the above situation comes up, I typically want to back up the largest tables or databases to my local machine without storing any sort of temporary files on the already full server. One way this can be done is pg_dumping straight over ssh using the following machine:

ssh -i <key> <user>@<ip> "PGPASSWORD='<password>' pg_dump -U <user> -d <db> -t <table>" \ >> <path>/backup.sql

Restoring a tablespace from a separate location

I faced this issue due to a changed drive name. I had an existing postgres tablespace under a partition named “data_ext” that I later renamed to “data”. Postgres was (understandably) unable to find this tablespace location. Apparently Postgres doesn’t allow you to change tablespace locations on the command line, so I had to find a way around it with the tbl_spc directory. This variable was not defined for me, but ended up being located at /var/lib/postgresql/10/main/pg_tblspc, and held a single symbolic link to the manually created tablespace at the old “data_ext” location. Simply rewriting this symbolic link with sudo ln -s /new/location /absolute/path/to/symlink updated the location properly. Note that I stopped the Postgres service before doing this to ensure nothing crazy happened while it was running.

SQL

Basic Queries

Standards

Database Setup and Management

PostgreSQL

Setup

Usage

Errors

No Disk Space

Backing up a table directly another machine

Restoring a tablespace from a separate location

Metadata

Collection

Link Subgraph

Backlinks

Webmentions