Monday, July 5, 2010

PostgreSQL-Stored Procedures

Stored Procedures

Stored Procedures are subroutines that are stored inside the database. They allow you to select and manipulate data, and, with the use of control structures and loops, perform complex computations and return the calculated result to the client. This saves considerable amounts of client/server communication.

PostgreSQL allows Stored Procedures to be written in several different Procedural Languages, including Perl, Python, TCL, and pgSQL - the PostgreSQL internal procedure language. User-defined Procedural Languages can also be used, and several of these languages are easily downloadable, e.g. PL/Java.

In this article, we will be using PL/pgSQL. PL/pgSQL is very similar to normal SQL, but adds many more features to it, like control structures and user-defined data types and functions.
Example 1: The Basic Stored Procedure

Let's get started with a very basic stored procedure that returns "Hello World!" - not very useful, I know, but it will get us started with the basic syntax of PL/pgSQL. Here's the code:

create or replace function hello() RETURNS text AS $$
DECLARE
hello text;
begin
hello := 'Hello World!';
return hello;
END;
$$ LANGUAGE plpgsql;

Here's what it does:

create or replace function hello() RETURNS text AS $$

Creates the function called hello which receives no parameters and returns text. You must always define what the function returns; use VOID if you don't need to return anything.

DECLARE

Opens the variable declarations block.

hello text;

Declares a variable called "hello" of type "text". To define multiple variables, use ";" as the separator. You can use any of standard types used in tables, like integer and float, and even user-defined types or domains.

BEGIN

Starts the actual function code.

hello := 'Hello World!';

Pupulates the variable "hello" with 'Hello World!'. Note that you have to use single quotes for string/text values.

return hello;

Returns our value.

END;

Ends the function.

$$ LANGUAGE plpgsql;

Defines what language we used - 'plpgsql' in this case. To call that function, you use the following SQL code.

select * from hello();

The output will be a text field called "hello", with the value of "Hello World!".
Example 2: Populating a Table with Test Data

This is another use of a Stored Procedure ('SP' from now on) - generating test data for your tables. Let's use last month's article as an example - we used a SP to generate 500K rows of data for one of our tables. Here's the code:

create or replace function test_data_computer()
RETURNS integer AS $$
DECLARE
count integer;
sql text;
begin
count = 1;
LOOP
sql = 'insert into computer(computer_id, computer_ram, cpu_id, video_id) values';
sql = sql || '('|| count ||', ' || random()*1024 || ', ' || (random()*49999)+1 || ', ' || (random()*49999)+1 || ')';
EXECUTE sql;
count = count + 1;
EXIT WHEN count > 500000;
END LOOP;
return count;
END;
$$ LANGUAGE plpgsql;

It starts much like our previous example, but this time we declare 2 variables instead of one. Things become different at line 8, where we introduce the LOOP statement. The loop is a basic repeating structure: it repeats the code inside indefinitely, until it finds a EXIT or EXIT WHEN clause.

Lines 9 and 10 are used to generate the SQL code to include a simple record in our tables. The double pipes ("||") is the concatenation operator. Random() generates a random float number between 0 and 1 (so "random()*49999)+1" will generate a random number between 1 and 50000).

Line 11 executes the SQL code stored inside the sql variable, adding the registry to the table.

Lines 12 and 13 are used to control the flow of the LOOP, and if omitted will make the loop an infinite one. "EXIT WHEN count > 500000;" makes the loop stop when the condition is met (when "count" goes over 500000 in this case.)

Line 14 closes the LOOP block, making the function go back to line 8, executing everything that is inside the loop again (and again, and again).

Line 15 returns the number of added registries (plus one in this case).
Example 3: Calculations and Date/Time Handling

Let's make up a scenario for this one. Imagine that you are building a system for a doctor, and one of the bits of data he wants is exactly how much time he spends with his patients (NOT just idling in the office.) Even more, he wants to be able to select the data for a given date or date interval, and he wants the option of selecting the records of either a single patient or all of them. Complex scenario, right? Well, we can solve it all with a single SP. These are the tables our database will have:

create table patient (
patient_id serial primary key,
patient_name text );

create table visits (
v_id serial primary key,
patient_id integer references patient,
v_date date,
v_time_start time,
v_time_end time );

One for the patients, another one to store the visits, with the date, start, and end time. Let's now populate the tables with some data:

insert into patient (patient_name) values ('Deividson');
insert into patient (patient_name) values ('John');
insert into patient (patient_name) values ('Benjamin');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (1, '10/04/2008', '08:00', '09:00');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (1, '14/04/2008', '13:00', '13:45');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (1, '18/04/2008', '10:00', '10:15');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (2, '11/04/2008', '14:00', '15:00');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (2, '12/04/2008', '14:00', '15:45');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (2, '17/04/2008', '14:00', '15:15');
insert into visits (patient_id, v_date, v_time_start, v_time_end) values (3, '15/04/2008', '08:00', '12:00');

Three patients, seven records - enough to test our SP. Here is the code:

CREATE OR REPLACE FUNCTION total( date1 date, date2 date, patient integer )
RETURNS interval AS $$
DECLARE
total interval;
rec record;
sql text;
BEGIN
total = '00:00:00'::time;
sql = 'select * from visits';

if date1 is not null OR patient is not null then
sql = sql || ' where ';
end if;

if patient is not null then
sql = sql || '(patient_id = ' || patient || ')';
end if;

if date2 is not null AND date1 is not null then
if patient is not null then
sql = sql || ' AND ';
end if;
sql = sql || '(v_date between ''' || date1 || ''' and ''' || date2 || ''')';
else
if date1 is not null then
if patient is not null then
sql = sql || ' AND ';
end if;
sql = sql || '(v_date = ''' || date1 || ''')';
end if;
end if;

for rec in EXECUTE sql loop
total = total + (rec.v_time_end - rec.v_time_start);
end loop;

return total;
END;
$$ LANGUAGE plpgsql;

Wow! Big one this time, eh? Let's take a look at it. The start of the code is pretty similar to the other examples, but we have 3 variables this time. 'total' will store the total time to return to the client, and 'rec' (of type record) is a variable that will hold the result of the query we will run.

On line 8, we start the variable total with the value of 00:00:00 - the "::" is a typecast - ":: time" means the string we're passing ("00:00:00") needs to be turned into a time.

From line 9 all the way down to line 31, all we are doing is creating the SQL statement that will select the data we want. Here, we use another type of structure - the IF. IFs are basic flow-control structures, and its syntax is (as in most programming languages):

IF (condition) THEN (commands) [ELSE (commands)] END IF;

The condition can be any logical comparison ( <, > , =, IS NULL, or IS NOT NULL), and you can combine multiple conditions using the logical operators (AND, OR, etc). If the condition is true, then the execution will continue with the commands inside the THEN clause, or if it's false, execution will move to the commands in the ELSE (if it exists), or to after the END IF.

It's in those IFs that we create the conditions (single date, date interval, single patient, etc.)

On line 33, we execute the SQL code we generated, looping over each of the records of the result. We then add the duration of each visit in the 'total' variable, and return the result when there are no more records available.

We can call this SP in one of several different ways, each way selecting a different data set and giving us a different result:

# All the records, from all patients
select * from total(NULL, NULL, NULL);

# All the records, from patient #3 only
select * from total(NULL, NULL, 3);

# Records from '14/04/2008', all patients
select * from total('14/04/2008', NULL, NULL);

# Records from '14/04/2008', patient #1 only
select * from total('14/04/2008', NULL, 1);

# Records from '14/04/2008' through '17/04/2008', all patients
select * from total('14/04/2008', '17/04/2008', NULL);

# Records from '14/04/2008' through '17/04/2008', patient #2 only.
select * from total('14/04/2008', '17/04/2008', 2);

Conclusion

Stored Procedures are powerful and flexible, and can be a very good way to help you pre-select and pre-process data, as well as allowing you to manipulate data and run code directly on the server.

PostgreSQL offers a comprehensive manual on their site, including a chapter about PL/pgSQL. You can find it here: http://www.postgresql.org/docs/8.3/static/plpgsql.html

That's it for Stored Procedures - see you next month, when we'll discuss Triggers!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.