Automating File Transfer from an FTP Site With Python

computer hard drive disk

I wanted to share a project I’m working on as part of my involvement with South Central Task Force GIS Workgroup.  As part of our monthly regional data update process, a file geodatabase is created.  It contains updated versions of various datasets ranging from road centerlines to parks.    The geodatabase is zipped and placed on a shared directory on our FTP site.  At our group’s last meeting, it was suggested that we develop a script to automate downloading the file.  The logic behind this comment was that people will actually download (and hopefully use) the geodatabase if a scheduled task did the work, as opposed to people who are juggling 100 responsibilities having to manually log into the FTP and download the file.

While the script is relatively simple, it did require some research to figure out how to download the zipped file.  In addition to the standard modules I typically use (datetime and os), it uses zipfile and ftplib.  I also added a few checks to make sure the file being downloaded exists, and that it is a zip file.  Whether these are necessary is debatable, but I’m trying to make my scripts more airtight.

As it currently stands, the script would need to be run by each County on the same day the geodatabase is created.  This wouldn’t be too difficult to accomplish as the geodatabase is created on a specified day of the month (first or second Tuesday?).  I match the date stamps using currentTime.strftime("%Y%m%d") to get the date component of the file name.  A next step for the project is to look into downloading the newest zip file.  I’m hoping there are some ftplib methods that return file meta information.  The code for this project can be found on GitHub.  I’ll highlight some of the code, but first is a diagram outlining the logic:

diagram outlining script that downloads zip file from ftp and extracts it
Diagram of script flow. Icons provided by Freepik and Anton Saputro from FlatIcon

After logging into the FTP, we change directories using the ftp.cwd method to the desired directory.  Next, we get a list of the files in the current directory using the ftp.nlst method.  We can create a variable representing the zipped file geodatabase (or any zipped file), and perform a conditional test to check if our file is in the current directory.

If the file exists, we will open the zip file on our computer/file server (even though it doesn’t exist yet) in write binary mode, and then call the ftp.retrbinary method to copy/download it.  This is the part of the script that was most difficult to figure out.  Here’s the code sample:

# check if file is in list
# regionalGdb is a variable representing the zipped file geodatabase
# filesInFtpDir was assigned to the ftp.nlst() method
if regionalGdb in filesInFtpDir:
   # download zipped geodatabase to local directory
   # localDir is a variable representing the folder where the file will be copied too
   # as I understand it, we're creating a placeholder for this file, and then
   # copying the file from the FTP site to it, but I may have an incorrect understanding
   with open(os.path.join(localDir, regionalGdb), 'wb') as local_file:
      ftp.retrbinary('RETR ' + regionalGdb, local_file.write)
      print 'Downloaded regional geodatabase to {}'.format(localDir)

If the file is not in the current FTP directory, we can write a message, close our FTP connection, and exit the script.  But if the file is downloaded, the next step is to extract it.

While it might be overkill, I employ a test to verify the downloaded file is actually a zip file.  I start by calling os.listdir([directory where file was downloaded]) to get a list of files.  I create a file object for the downloaded file by running the os.path.join([directory where file was downloaded], [name of zip file]) method.  I call the zipfile.is_zipfile() method to test if the downloaded file is in fact a zip file.  If it is, then I extract the zip file in the same directory it was downloaded too:

# Unzip zipped file
# get listing of files in local directory
filesInLocalDir = os.listdir(localDir)
# check if regional geodatabase is in directory
if regionalGdb in filesInLocalDir:
   # file object for geodatabase zip file
   gdbZipFile = os.path.join(localDir, regionalGdb)
   # verify file is a zipped file
   if zipfile.is_zipfile(gdbZipFile):
      # create zip file object
      with zipfile.ZipFile(gdbZipFile) as z:
         # unzip file
         z.extractall(localDir)
         print 'Completed unzipping regional geodatabase'
   else: # file is not a zip file
      print 'The file: {}, is not a zipped file'.format(regionalGdb)
else: # file was not found on local directory.
   print 'The file: {}, was not found in {}'.format(regionalGdb, localDir)

While the script is relatively simple, running it as a scheduled task can take an easily overlooked manual task and automate it behind the scenes.  I’m also interested to see if I can figure out how to check for the most recent zip file in the FTP directory as an alternative to having the file names match.

The entire script can be found on GitHub.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s