Getting Started
- 6 minutes read - 1256 wordsOK so you have decided to learn python, excel just isn’t cutting it anymore and the person you work with makes lots of nice plots in python and you want to be able to do that too…cool let’s begin.
Here what we are going to do:
- How to get python installed
- Install an IDE
- Configure your environment
- Read a .csv or .xlsx
- Make a scatter plot
Installing Python
Firstly I’m going to offer my condolences for anyone with corporate managed computers, networks and especially those on windows, ANY of those things is going to make your life more difficult.
But we won’t let that stop us, if you are working in a corporate network the best thing to do is to get PowerUser, Admin or someother set of permissions that allows you to install software then you are a free bird with the power to install iTunes on your workcomputer.
If IT forces you to go via the ticketing system to get software installed, again I’m sorry, maybe you want to do this on your home computer? If you are still willing to persist and get this at work then you need to ask for a recent version of python 3+ python 2.7 is end of life.
For those luckly people with a computer that they control installing python is easy, for compatibilities sake don’t use the current release. If the current version is 3.10.xx download the latest version of 3.9.xx.
Also you want to stick to the 64bit installer with that out of the way go get python from here install it by following the prompts then come back.
Use an IDE
Once you have python installed the next step you need to do is get an IDE (Integrated Development Environment) as this greatly simplifies the code writing process there are lots of options but I recommend VS Code.
Once you have installed VS Code there are a few steps you need to do to set up the python environment, you should ideally follow the whole tutorial here. But I’m sure that you many of you won’t two of the things I have seen trip people over more than once are these two steps:
Configure your environment
Now that you have a functioning install of VS Code and python, we will install python packages which make it easy to plot things and read .csv and .xlsx files .
So in your terminal console at the bottom of the VS Code Ide you need to enter the following text:
pip install pandas matplotlib xlrd openpyxl
Be aware that you are not running pip in python, pip is run from the command line windows cmd or bash.
Which installs from pip:
- pandas for dealing with tabular data like .csvs
- xlrd to decode the excel files
- matplotlib for plotting
- some dependencies which are software libraries that the tools you installed matplotlib, pandas, xlrd need to function
Now if you are doing this at work and this doesn’t work you are probably having issues with the network proxy, the path of least resistance is to take your laptop home and work off the VPN.
If you can’t do that for some reason could ask IT to open access to pip in the proxy or add your username and password to the call to pip like this:
You will need to know these:
- user_name
- password
- proxyname
- port usually 8080 or 8081
pip install pandas matplotlib xlrd openpyxl --proxy https://user_name:password@proxyname:port
Read a csv or excel file
If you don’t have a file at hand I have made a .csv available here just to demonstrate some basic plotting.
One of the great things about python is that it includes a lot of prebaked libraries included with python that make your life easier, one the them is urllib and it comes installed by default.
We are going to use it to download the example data and save it to a .csv if you prefer you can download the file by hand then change the path to suit.
import urllib.request # this is the url reading library included in python
import os # os is the operating system interface from python
# the line below sets a varible called url to the location of the example file
url = 'https://www.fractalgeoanalytics.com/python/getting_started.files/example.csv'
# this line sets a varible called outfile this is the file name that will be saved
outfile = 'example_data.csv'
# we are checking to see if there is a file called outfile
# if it doesn't exist we are going to download it from the web
# and save it to the python local directory
if not os.path.exists(outfile):
# using with here ensures that we open and close file resources
# correctly in case something goes wrong
with urllib.request.urlopen(url) as response:
dataset = response.read().decode('utf-8') # read the url and decode it with utf-8
# save the file
with open(outfile,'w') as file:
file.writelines(dataset)
# this is a print statement it is here so you know what is happening
print(f'{outfile} has been saved!')
else:
print(f'{outfile} already exists!')
Once you have downloaded the file we will need to read it into python, I am a huge fan of just sticking to .csv files but we are going to save the files as an xlsx to demonstrate the different functions.
# import pandas to read csv files n.b. this will fail if pip install failed
# as will the call to import matplotlib you will need to sort out that before
# you can run this section
import pandas as pd
data = pd.read_csv(outfile)
# here is how you write an .xslx file
data.to_excel('example_data.xlsx',index=False)
# and reading from a xlsx file
pd.read_excel('example_data.xlsx')
# if you want to read from a specific sheet use the syntax below beware that
# python has numbers starting from 0(0 indexing) so sheet 1 is now sheet 0
data = pd.read_excel('example_data.xlsx',sheet_name=0)
Make Plots
I’ve written a few basic plotting examples in matplotlib that I hopewill get you started, these are 99% of the plots that I and most other people use to do data analytics.
from matplotlib import pyplot as plt
# now lets just make a very simple scatter plot but we will access the
# data in a few ways to demonstrate the various ways that pandas lets you
# access variables
# plot X against Y the default line plot
plt.plot(data.X, data.Y)
# show the plot
# you will need to close it to keep interacting with python
plt.show()
# plot X against Y as a scatter
plt.plot(data.X, data.Y,'.')
plt.show(block=False) # adding block=False allows you to keep interacting with python
plt.figure()# creates a new figure important if you are making lots of plots then
# running show at the end
# when you run plot more than once before running plt.show()
# matplotlib adds the the plot
plt.plot(data.X, data.Y,'.')
# syntax data['X'] and data.X are the same
plt.plot(data['X'], data['Y'])
plt.show(block=False) # adding block=False allows you to keep interacting with python
plt.figure(figsize=[9,20])# fig size changes the size of the figure
plt.plot(data['X'], data['Y'])
plt.title('Amazing Data') # this adds a title
# adding the x and y axis labels
plt.xlabel('x')
plt.ylabel('y')
# set the axis so that X and Y have the same aspect ratio
plt.axis('equal')
plt.show(block=False)
# simple histogram
plt.hist(data["X"])
plt.show()
# easy way to have a scatter plot with colours by group
plt.scatter(data["X"],data["Y"],8,c=data['C'])
plt.axis('equal')
plt.show()
Thanks for following along and I hope that this helps some one impress the boss, the person at the desk next door or your dog.
Ben.