Black Lives Matter. Please consider donating to Black Girls Code today.

How to list existing files on the URL of a Github repository?

I tried this but it doesn’t work. Any help is welcomed. Thanks a lot.

import os
import pathlib

pathdirectory = pathlib.Path('https://github.com/thegithub/therepository')

def list_files():
    files = []
    for filename in os.listdir(pathdirectory):
        path = os.path.join(pathdirectory, filename)
        if os.path.isfile(path):
            files.append(filename)
    return files

found = list_files()

OSError: [WinError 123] the file name, directory name or volume lable syntax is ncorrect

You won’t be able to use listdir for this. You might be able to do something by making a request to the GitHub API. Check out this related StackOverflow post

I actually found a way. Might not be the best one but it works.
I used python requests and BeautifuSup libraries
here i listed only the *.csv files

import requests
from bs4 import BeautifulSoup

# URL on the Github where the csv files are stored
github_url = 'https://github.com/USERNAME/REPOSITORY/tree/master/FOLDER'  # change USERNAME, REPOSITORY and FOLDER with actual name

result = requests.get(github_url)

soup = BeautifulSoup(database.text, 'html.parser')
csvfiles = soup.find_all(title=re.compile("\.csv$"))

filename = [ ]
for i in csvfiles:
        filename.append(i.extract().get_text())

print(filename)