Black Lives Matter. Please consider donating to Black Girls Code today.

Scrape data online in dash application

Hi!

I am new to Dash and trying to implement an application that scrapes data online and create graphs according to the data scrapped.

The basic layout looks like this:
61692472-90985700-ace2-11e9-8d28-d833213e60be

It takes in the date as input and scrapes data online. The callback part looks like this:

@app.callback(
[Output(‘overall-barchart-by-brand’, ‘figure’),
Output(‘overall-piechart-by-brand’, ‘figure’)
],
[Input(‘submit-date’, ‘n_clicks’)],
[State(‘start-year’, ‘value’),
State(‘start-month’, ‘value’),
State(‘end-year’, ‘value’),
State(‘end-month’, ‘value’)
]
)
def update_main_charts_by_brand(n_clicks, start_year, start_month, end_year, end_month):
# scrape data online:
df = pd.DataFrame()
url = ‘https://xl.16888.com/brand-0-’ + start_year + start_month + ‘-’ + end_year + end_month + ‘-1.html’
current_page = 1

page = requests.get(url)
soup = bs4.BeautifulSoup(page.content, 'lxml')
num_of_pages = 2

while current_page <= num_of_pages:
    page = requests.get(url)
    soup = bs4.BeautifulSoup(page.content, 'lxml')
    table = soup.find(name='table', attrs={'class':'xl-table-def'})
    df = df.append(table_to_df(table))

    if current_page != num_of_pages:
        url = 'https://xl.16888.com/brand-0-' + start_year + start_month + '-' + end_year + end_month + '-' + str(current_page+1) + '.html'
    current_page += 1

df.columns = ['Rank', 'drop', 'Brand',  'Country', 'Sales', 'Percentage', 'Other_info']
df = df.dropna()
df.index = df.Rank

bar_chart = go.Figure([go.Bar(x=df['Brand'].values, y=df['Sales'].values)])
# clean data for pie chart
df_percent = df[df['Percentage'] != '-']
pie_chart = go.Figure([go.Pie(labels=df_percent['Brand'], values=[float(percent.strip('%'))/100 for percent in df_percent['Percentage']])])

return bar_chart, pie_chart

I tried this function in jupyter notebook and it works fine, but I got this error when running the dash application:

ValueError: Length mismatch: Expected axis has 1 elements, new values have 7 elements

I think the data is not scrapped correctly, which caused the problem when assigning the column names. Any idea how can I solve this?

I have the same issue except I am using selenium. Do you got a solution or workaround?