In this particular issue, we will introduce creating crosstab queries using PostgreSQL tablefunc contrib. Tablefunc is a contrib that comes packaged with all PostgreSQL installations - we believe from versions 7. We will be assuming the one that comes with 8. Note in prior versions, tablefunc was not documented in the standard postgresql docs, but the new 8. Keep in mind that the functions are installed by default in the public schema.

There are a couple of key points to keep in mind which apply to both crosstab functions. Source SQL must always return 3 columns, first being what to use for row header, second the bucket slot, and third is the value to put in the bucket.

postgresql crosstab tutorial

This means that in order to use them in a FROM clause, you need to either alias them by specifying the result type or create a custom crosstab that outputs a known type as demonstrated by the crosstabN flavors.

Otherwise you get the common a column definition list is required for functions returning "record" error. A corrollary to the previous statement, it is best to cast those 3 columns to specific data types so you can be guaranteed the datatype that is returned so it doesn't fail your row type casting. Each row should be unique for row header, bucket otherwise you get unpredictable results Setting up our test data For our test data, we will be using our familiar inventory, inventory flow example.

Code to generate structure and test data is shown below. For this example we want to show the monthly usage of each inventory item for the year regardless of project. This in most cases is not terribly useful and is confusing. To skirt around this inconvenience one can write an SQL statement that guarantees you have a row for each permutation of Item, Month by doing a cross join. Below is the above written so item month usage fall in the appropriate buckets.

There are a couple of situations that come to mind where the standard behavior of crosstab of not putting like items in same column is useful. One example is when its not necessary to distiguish bucket names, but order of cell buckets is important such as when doing column rank reports.

For example if you wanted to know for each item, which projects has it been used most in and you want the column order of projects to be based on highest usage. Recall we said that crosstab requires exactly 3 columns output in the sql source statement.

No more and No less. So what do you do when you want your month crosstab by Item, Project, and months columns. One approach is to stuff more than one Item in the item slot by either using a delimeter or using an Array.

We shall show the array approach below. If month tabulations are something you do often, you will quickly become tired of writing out all the months.

One way to get around this inconvenience - is to define a type and crosstab alias that returns the well-defined type something like below:. Adding a total column to a crosstab query using crosstab function is a bit tricky. Recall we said the source sql should have exactly 3 columns row header, bucket, bucketvalue.

Well that wasn't entirely accurate. Don't get extraneous columns confused with row headers. They are not the same and if you try to use it as we did for creating multi row columns, you will be leaving out data.In our Metrics Maven series, Compose's data scientist shares database features, tips, tricks, and code you can use to get the metrics you need from your data. In this article, we'll look at the crosstab function in PostgreSQL to create a pivot table of our data with aggregate values.

If you've used spreadsheet software, then you're probably familiar with pivot tables since they're one of the key features of those applications. The same pivot functionality can be applied to data in your database tables. Typical relational database tables will contain multiple rows, often with repeating values in some columns.

In this way, the data extends downward through the table.

postgresql crosstab tutorial

Aggregate functions and group by options can be applied at query time to determine metrics like count, sum, and average for categories of the data. It's all pretty straightforward, but sometimes having a pivot table that extends the data across, rather than downward, with those metrics at-the-ready makes it easier to do comparisons or to filter on certain attributes. Luckily PostgreSQL has a function for creating pivot tables.

It's called crosstab. In this article we're going to look at how to use the crosstab function to output a result set of aggregate values pivoted by category.

In our examples below, we'll pivot data from a product catalog, but you'll be able to see how it can be applied to a variety of data situations. First things first. To run crosstab we'll need to enable the tablefunc module. Besides crosstabthe tablefunc module also contains functions for generating random values as well as creating a tree-like hierarchy from table data. We do this in the data browser by navigating to our database then clicking on the "Extensions" option on the left side:.

Once we're on the Extensions page, we just scroll down to "tablefunc" and select "install" from the right side. It will instantly be enabled:. There are a couple of different crosstab options that you can read about on the tablefunc page in the PostgreSQL documentation and experiment with for your particular situation.

If you want learn more about the more basic crosstab option, check out our article called Crosstab Revisited where we compare the two options and explain how they're different. The first thing we want to know from our data is the average price of the products in each category by product line.

Typically we'd run a query that uses the avg aggregate function and group by to determine this:. Note the single quotes around the original query. You'll get a syntax error without them. You might also have noticed our usage of the round function, which we covered in our previous article on how to make data prettyin order to round the result to an appropriate number of decimal points - 2 in this case since we're dealing with currency. We've then added another query note the comma separating the two queries to return the distinct categories in the order we're expecting.

It's important to know exactly which values and in which order the pivoted field will return them so that we can name the new columns correctly. Ideally, the values are not changing often if ever since we're doing a bit of hard-coding here.

And that brings us to specifying the output column names and data types in an AS clause.Some years ago, when PostgreSQL version 8. This extension provides a really interesting set of functions. One of them is the crosstab function, which is used for pivot table creation. The simplest way to explain how this function works is using an example with a pivot table. As you read this article, imagine yourself as a teacher at a primary elementary school.

We will assume that you teach every subject language, music, etc. The school provides a system for you to record all evaluation or test results.

In computer science, we call this kind of grid a pivot table. If you analyze how the pivot table is built, you will find that we use values from raw data as column headers or field names in this case, geography, history, maths, etc.

As we previously mentioned, the crosstab function is part of a PostgreSQL extension called tablefunc. To call the crosstab function, you must first enable the tablefunc extension by executing the following SQL command:. We must define the names of the columns and data types that will go into the final result. For our purposes, the final result is defined as:.

From a single data set, we can produce many different pivot tables.

postgresql crosstab tutorial

For instance, suppose we want to obtain the average evaluations for John Smith from March to July. In a grid like the following, the table would look like this:. The code would look like this:. The following pivot table is the result of this query. Here we have grades for geography and language:.

Of course, the second query is the correct one because it is showing raw data. The problem is in the pivot table building process — some categories are missing information.Pandas offers several options for grouping and summarizing data but this variety of options can be a blessing and a curse. My goal is to have this article be a resource that you can bookmark and refer to when you need to remind yourself what you can do with the crosstab function.

The pandas crosstab function builds a cross-tabulation table that can show the frequency with which certain groups of data appear.

In the table above, you can see that the data set contains 32 Toyota cars of which 18 are four door and 14 are two door. Pandas makes this process easy and allows us to customize the tables in several different manners. For this example, I wanted to shorten the table so I only included the 8 models listed above. The crosstab function can operate on numpy arrays, series or columns in a dataframe. For this example, I pass in df. Pandas does that work behind the scenes to count how many occurrences there are of each combination.

Before we go much further with this example, more experienced readers may wonder why we use the crosstab instead of a another pandas option. The question still remains, why even use a crosstab function?

The longer answer is that sometimes it can be tough to remember all the steps to make this happen on your own. In my experience, it is important to know about the options and use the one that flows most naturally from the analysis.

One common need in a crosstab is to include subtotals. We can add them using the margins keyword:. The margins keyword instructed pandas to add a total for each row as well as a total at the bottom. All of these examples have simply counted the individual occurrences of the data combinations. In those areas where there is no car with those values, it displays NaN. We have seen how to count values and determine averages of values. However, there is another common case of data sumarization where we want to understand the percentage of time each combination occurs.

This can be accomplished using the normalize parameter:. This table shows us that 2. The normalize parameter is even smarter because it allows us to perform this summary on just the columns or rows. This view of the data shows that of the Mitsubishi cars in this dataset, One of the most useful features of the crosstab is that you can pass in multiple dataframe columns and pandas does all the grouping for you.

First, I included the specific rownames and colnames that I want to include in the output. I want to make one final note on this table. It does include a lot of information and may be too difficult to interpret.By using our site, you acknowledge that you have read and understand our Cookie PolicyPrivacy Policyand our Terms of Service. The dark mode beta is finally here.

Change your preferences any time. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Does any one know how to create crosstab queries in PostgreSQL?

For example I have the following table:. Install the additional module tablefunc once per database, which provides the function crosstab. Since Postgres 9. The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:.

Since you have to spell out all columns in a column definition list anyway except for pre-defined crosstab N variantsit is typically more efficient to provide a short list in a VALUES expression like demonstrated:. I used dollar quoting to make quoting easier. You can even output columns with different data types with crosstab text, text - as long as the text representation of the value column is valid input for the target type.

This way you might have attributes of different kind and output textdatenumeric etc. There is a code example at the end of the chapter crosstab text, text in the manual. Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns". Postgres 9. Similar result as above, but it's a representation feature on the client side exclusively. There are more code examples at the bottom of that page.

The previously accepted answer is outdated. The variant of the function crosstab text, integer is outdated. The second integer parameter is ignored. I quote the current manual :. Obsolete version of crosstab text.

The parameter N is now ignored, since the number of value columns is always determined by the calling query.Does any one know how to create crosstab queries in PostgreSQL?

For example I have the following table:. Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database. Install the additional module tablefunc once per database, which provides the function crosstab. Since Postgres 9. The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end.

Often you will want to query distinct attributes from the underlying table like this:. Since you have to spell out all columns in a column definition list anyway except for pre-defined crosstab N variantsit is typically more efficient to provide a short list in a VALUES expression like demonstrated:. I used dollar quoting to make quoting easier. You can even output columns with different data types with crosstab text, text - as long as the text representation of the value column is valid input for the target type.

This way you might have attributes of different kind and output textdatenumeric etc. There is a code example at the end of the chapter crosstab text, text in the manual. Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns". Postgres 9. Similar result as above, but it's a representation feature on the client side exclusively. There are more code examples at the bottom of that page.

The variant of the function crosstab text, integer is outdated. The second integer parameter is ignored. I quote the current manual :.

Learn these quick tricks in PostgreSQL

Obsolete version of crosstab text. The parameter N is now ignored, since the number of value columns is always determined by the calling query. It fails if a row does not have all attributes. See safe variant with two input parameters above to handle missing attributes properly.

PostgreSQL: Example of CROSSTAB query, for PIVOT arrangement

The manual:. Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:.In the simplest case, we start from only two columns, one being a function of the other.

In the example above they represent years, in ascending order from left to right. If there is no corresponding V for X,Y in the dataset, the corresponding field in the grid is set to NULL or to a specific marker for empty values. When the number of columns is relatively limited, this representation has some interesting visual advantages over the top-down list of X,Y,V tuples:. A query pivoting three columns x,y,v can be written like this:.

Often a pivot query will aggregate values simultaneously with pivoting. The typical form of query will be like:. The first argument of crosstab is the text of a query returning the data to pivot. The second argument is another query returning the names of columns after pivoting, in the desired order.

Both the crosstab -based queries and the canonical form have the drawback that the output columns must be explicitly enumerated, so that when a new value appears in the rows to transpose, it must be added manually to the list.

Otherwise, with the canonical form the new data would be ignored, and with crosstab it would be likely to cause a mismatch error. These queries also lack flexibility: to change the order of the columns, or transpose a different column of the source data for instance have cities on the horizontal axis instead of yearsthey need to be rewritten.

So the difficulty of a dynamic pivot is: in an SQL query, the output columns must be determined before execution. To solve this chicken and egg problem, we need to loosen the constraints a bit.

Firstly, a SQL query can return the pivoted part encapsulated inside a single column with a composite or array type, rather than as multiple columns. The function would build a dynamic SQL query and instantiate a cursor over it. Another variant would be for the function to create a view or a table, temporary or permanent, returning the pivoted data. It would return its name for instance, and the client-side code would run a SELECT on this table or view, and probably drop it afterwards.

And here they are:. The client-side presentation layer can also do the work of transposing rows into columns from a non-pivoted dataset. Since version 9. In interactive use, this method is probably the quickest way to visualize pivoted representations. For instance, say we want to have a look at the year,city couples in our example with more that rainy days per year:.

The horizontal header holds the values of the 2nd column of the source data. To have the other transposition, we just have to give year and city as arguments to crosstabview in that order, without changing the query:.

postgresql crosstab tutorial

For example, the following command sorts the city columns by rank of rainfall, adding it as a 4th column to the query and passing it to crosstabview:.

We can see that the numbers get distributed in a way that shows a left-to-right gradient from the less to the more rainy, with Brest as the clear winner for this dataset. Say for example that our weather data was structured as below, with a distinct column for each month of the year, spreadsheet-style. To get the final unpivoted result, we need to improve a bit this query to associate the year and city to each measurement, and filter and type-cast the month columns:.