Column Pivot in PySpark
Hey, welcome back ! This post is about pivoting a column in PySpark. Or to put it in simpler terms, it is about splitting a column with categorical values, into multiple dummy variable columns (similar to One-Hot Encoding). The column has its distinct values transposed into individual columns. The best part is that this can be done in just one line ! Let's say that we have a database Classroom and a column within that, as Gender. This column contains values: Male / Female. We intend to split this column to 2 columns: Gender_Male and; Gender_Female (each containing values as 1 and 0). This can be achieved by the following line: Combining with GroupBy Pivoting can also come in handy when we wish to group by certain columns and then perform a pivot to find the number of boys and girls per group. For example, let's say that we have a column "Age_Group", which has values 1,2,3. And now, we wish to know the number of boys and girls in every age group. The line bel...