Column Pivot in PySpark

Hey, welcome back !

This post is about pivoting a column in PySpark. Or to put it in simpler terms, it is about splitting a column with categorical values, into multiple dummy variable columns (similar to One-Hot Encoding). The column has its distinct values transposed into individual columns. The best part is that this can be done in just one line !

Let's say that we have a database Classroom and a column within that, as Gender. This column contains values: Male / Female. We intend to split this column to 2 columns: Gender_Male and; Gender_Female (each containing values as 1 and 0). This can be achieved by the following line:

Combining with GroupBy

Pivoting can also come in handy when we wish to group by certain columns and then perform a pivot to find the number of boys and girls per group. For example, let's say that we have a column "Age_Group", which has values 1,2,3. And now, we wish to know the number of boys and girls in every age group. The line below does just that:

Comments

Popular posts from this blog

Reading & Writing a file from / to Azure Blob Storage (Using PySpark)

Why PySpark is so Awesome !