Column Pivot in PySpark
Hey, welcome back !
This post is about pivoting a column in PySpark. Or to put it in simpler terms, it is about splitting a column with categorical values, into multiple dummy variable columns (similar to One-Hot Encoding). The column has its distinct values transposed into individual columns. The best part is that this can be done in just one line !
Let's say that we have a database Classroom and a column within that, as Gender. This column contains values: Male / Female. We intend to split this column to 2 columns: Gender_Male and; Gender_Female (each containing values as 1 and 0). This can be achieved by the following line:
This post is about pivoting a column in PySpark. Or to put it in simpler terms, it is about splitting a column with categorical values, into multiple dummy variable columns (similar to One-Hot Encoding). The column has its distinct values transposed into individual columns. The best part is that this can be done in just one line !
Let's say that we have a database Classroom and a column within that, as Gender. This column contains values: Male / Female. We intend to split this column to 2 columns: Gender_Male and; Gender_Female (each containing values as 1 and 0). This can be achieved by the following line:
Comments
Post a Comment