Black Lives Matter. Please consider donating to Black Girls Code today.

Tip On Linear Fits Using Python

Hi Folks -

The community has been very helpful so I thought I would give back with a tip.

If you’ve used Python to plot out linear best fits using the process mentioned in the plotly API library (plot.ly https://plot.ly/python/linear-fits/), you will notice the annotations are manual inputs. This could be problematic if your data changes.

Current method:

annotation = go.Annotation(
x=3.5,
y=23.5,
text=’$R^2 = 0.9551,\Y = 0.716X + 19.18$’,
showarrow=False,
font=go.Font(size=16)
)
If your data changes, to get the annotations to follow suit without manual intervention, you could use the stats package (https://docs.scipy.org/doc/scipy/reference/stats.html) to define properties of your line of best fit based on the data. Like this:
slope, intercept, r_value, p_value, std_err = stats.linregress(df[df.columns[2]],df[df.columns[0]])
line = slope*df[df.columns[2]]+intercept

rval = str(round(r_value**2,2))
slp = str(round(slope, 4))
intr = str(round(intercept, 2))

From there you can define your annotations based on the variables:
annotation = go.Annotation(
x=50,
y=20,
text="R squared = " + rval + “, y =” + slp + “x+” + intr,
showarrow=False,
font=go.Font(size=16)
)

Let me know if you have questions - hope this helps!

Here’s an example: https://plot.ly/dashboard/joel.alcedo:53/view

Full code (after api keys and packages):

#opec crude price
df = quandl.get(“OPEC/ORB”)
df = df.resample(‘M’).mean()
#U.S. Crude Oil Rotary Rigs in Operation, Monthly
df1 = quandl.get(‘EIA/PET_E_ERTRRO_XR0_NUS_C_M’)

df = pd.merge(df, df1, left_index=True, right_index=True)
df[‘Year’] = df.index.strftime("%B %Y")
df.columns = [‘Crude’,‘US Oil Rigs’, ‘Time’]

slope, intercept, r_value, p_value, std_err = stats.linregress(df[df.columns[1]],df[df.columns[0]])
line = slope*df[df.columns[1]]+intercept

rval = str(round(r_value**2,2))
slp = str(round(slope, 4))
intr = str(round(intercept, 2))

trace1 = go.Scatter(
x = round(df[df.columns[1]],2),
y = df[df.columns[0]],
mode = ‘markers’,
marker=dict(color=’#aa1e1e’,
size = 7.5,
line = dict(width=2)
),
text = df[df.columns[2]],
name=‘Actual’
)

trace2 = go.Scatter(
x=df[df.columns[1]],
y=line,
mode=‘lines’,
marker=go.Marker(color=’#00000’),
name=‘Best Fit’
)

data = [trace1, trace2]

annotation = go.Annotation(
x=1200,
y=30,
text="R squared = " + rval + “, y =” + slp + “x+” + intr,
showarrow=False,
font=go.Font(size=16)
)

layout = dict(legend = dict(orientation =“h”),
hovermode = ‘closest’,
annotations = [annotation],
yaxis = dict(title = ‘OPEC Crude Oil Prices’,
ticklen= 5,
zeroline= True,
showgrid= False),
xaxis = dict(title = ‘U.S. Crude Oil Rotary Rigs in Operation’,
ticklen= 5,
zeroline= True,
showgrid= False)
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename=‘oilpricesrigs’)