SQL
1 2 3 4 5 6 7 8 | |
- Use the
pymssqlAPI andread_sqlto retrieve and load data from a SQL Server instance. Select the columns we want from the SQL Server data and use SQL aliases to improve column names (for example,feduASfathereducation). Create a connection to the SQL Server data by passing database credentials to thepymssqlconnect function. Create a pandas data frame by passing theselectstatement and connection object toread_sql. Close the connection to return it to the pool on the server:
1 2 3 4 5 | |
1 2 3 4 5 6 7 8 9 | |
1 2 | |
1 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | |
1 | |
| studentid | school | sex | age | ... | goout | gradeperiod1 | gradeperiod2 | gradeperiod3 | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 001 | GP | F | 18 | ... | 4 | 5 | 6 | 6 |
| 1 | 002 | GP | F | 17 | ... | 3 | 5 | 5 | 6 |
| 2 | 003 | GP | F | 15 | ... | 2 | 7 | 8 | 10 |
| 3 | 004 | GP | F | 15 | ... | 2 | 15 | 14 | 15 |
| 4 | 005 | GP | F | 16 | ... | 2 | 6 | 10 | 10 |
5 rows × 16 columns
1 2 3 4 5 6 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
- Replace coded data values with more informative values:
1 2 3 4 5 6 7 8 9 10 11 12 | |
1 | |
- Check for changes in memory usage:
- Change the type for columns with the changed data to
category:
1 | |
1 2 3 4 5 6 | |
1 2 | |
1 | |
1 2 3 4 5 6 | |
1 | |
1 2 3 4 5 6 | |
1 2 | |
| freetime | goout | |
|---|---|---|
| 1:very low | 0.05 | 0.06 |
| 2:low | 0.16 | 0.26 |
| 3:neutral | 0.40 | 0.33 |
| 4:high | 0.29 | 0.22 |
| 5:very high | 0.10 | 0.13 |
1 2 | |
| mothereducation | fathereducation | |
|---|---|---|
| 1:k-4 | 0.15 | 0.21 |
| 2:5-9 | 0.26 | 0.29 |
| 3:secondary ed | 0.25 | 0.25 |
| 4:higher ed | 0.33 | 0.24 |
1 | |