Tutorial: Get a complete folder structure from a Seafile Server with a python-script

cdb · September 28, 2022, 10:56pm

Hey everybody,

Today a customer approached me with the following question:

Would it be possible to get all files and folders from a Seafile Server and write the complete structure into a SeaTable base?

The answer is definitely yes, and I will show you how you can do this.

Step 1: set up a new base

First, you have to create a new base with the following structure. My table has the name Files in Repo.

Step 2: create a new python script

Copy-and-paste the following code and adapt it to your needs. I will explain later on how the script works.

import os
import sys
import requests
from seatable_api import Base, context
from datetime import datetime

server_url = context.server_url
api_token = context.api_token
table_name = 'Files in Repo'

seafile_url = 'https://your-seafile-server-url'
seafile_repo = 'add-the-repo-id-like: (43107b73-d90d-4552-937a-c1aa8baef43b)'

####################
# Option 1: Obtain Seafile Auth-token (username and password)
####################

#seafile_user = 'your-seafile-user'
#seafile_pw = 'your-seafile-password'
#url = seafile_url + "/api2/auth-token/"
#data = {'username': seafile_user, 'password': seafile_pw}
#token = requests.post(url, json=data).json()['token']

####################
# Option 2: use seafile api-token
####################

token = 'add-the-api-token-of-a-user-who-has-access-to-this-repo'

####################
# SeaTable Base-Auth
####################

base = Base(api_token, server_url)
base.auth()

####################
# get all files from this repo (including subfolders)
####################

def get_files(path):
  url = seafile_url + '/api2/repos/' + seafile_repo + '/dir/?p=/' + path
  headers = {
    'Authorization': 'Token {}'.format(token),
    'Accept': 'application/json; indent=4'
  }
  resp = requests.get(url,headers=headers)

  for f in resp.json():
    # for debugging: 
    # print(f)
    if f['type'] == 'file':
      size = str(round(f['size'] / 1000000,2)) + ' MB'
      mtime = str(datetime.fromtimestamp(f['mtime']))
      base.append_row(table_name, {'Size': size, 'Name': f['name'], 'Last Update': mtime, 'Path': path, 'Last Modifier': f['modifier_name']})
    else:
      get_files(path + '/' + f['name'])

# start with root-folder of the repo
get_files("")

print("Hasta la vista, baby. I'll be back...")

how this looks like

Here is a short video how the result will look like. The python-script will walk through the complete folder structure of one library in Seafile and write all the information to the SeaTable base. The size is rounded to MB and the modification timestamp is translated into a human-readable date.

get-files-from-seafile

Step by step python script explained

Part 0

import os
import sys
import requests
from seatable_api import Base, context
from datetime import datetime

server_url = context.server_url
api_token = context.api_token
table_name = 'Files in Repo'

seafile_url = 'https://your-seafile-server-url'
seafile_repo = 'add-the-repo-id-like: (43107b73-d90d-4552-937a-c1aa8baef43b)'

Well, what should I say. First I load all the required python modules and then I define some basic variables. You have to change the variables to match your setup.

Part 1

####################
# Option 1: Obtain Seafile Auth-token (username and password)
####################

#seafile_user = 'your-seafile-user'
#seafile_pw = 'your-seafile-password'
#url = seafile_url + "/api2/auth-token/"
#data = {'username': seafile_user, 'password': seafile_pw}
#token = requests.post(url, json=data).json()['token']

####################
# Option 2: use seafile api-token
####################

token = 'add-the-api-token-of-a-user-who-has-access-to-this-repo'

In the next section, I want to use the Seafile-API to get all files from a repo. Every API-call has to be authenticated with a token. This token can either be generated for a username + password combination (=Option 1) or you can add your token directly as a variable (=Option 2).
I would prefer Option 2 because you skip one API-call per script execution.

You can find more information about the authentication of API-calls at this page:

Part 2

base = Base(api_token, server_url)
base.auth()

This is necessary to append the file infos to the SeaTable base later on.

Part 3

def get_files(path):
  url = seafile_url + '/api2/repos/' + seafile_repo + '/dir/?p=/' + path
  headers = {
    'Authorization': 'Token {}'.format(token),
    'Accept': 'application/json; indent=4'
  }
  resp = requests.get(url,headers=headers)

  for f in resp.json():
    # for debugging: 
    # print(f)
    if f['type'] == 'file':
      size = str(round(f['size'] / 1000000,2)) + ' MB'
      mtime = str(datetime.fromtimestamp(f['mtime']))
      base.append_row(table_name, {'Size': size, 'Name': f['name'], 'Last Update': mtime, 'Path': path, 'Last Modifier': f['modifier_name']})
    else:
      get_files(path + '/' + f['name'])

# start with root-folder of the repo
get_files("")

Seafile has no API-call to get all files of library, including all subdirectories. Therefore, I define a function to get all files from one folder. Then I loop through the results. The result contains files and folder.

if it is a file, I append the file info to the SeaTable base
if it is a folder, I call the function again with a new path.

With this setup, it is possible to walk through the complete library with all subfolders.
If you have questions, please let me know.

Quant · January 4, 2023, 3:33pm

This helps a lot when people like me using Seafile and SeaTable together. Thank you for providing such a useful script. Next step I’ll use Python runner to add each file’s link to a url column, and create a button to exxcute openning the file link. That should be very convenient for people to view these files directly from the File-info Base.

Quant · January 5, 2023, 6:03am

Hi, cdb
I tested the script and encountered a problem today.
When there are more than a specific number of files in a library, it shows a rate limit alert.

Accordingly, I adjust the rate limit in Seafile’s server like the code below, but the problem still exists.

Is there any other configeration I shoud change? Thanks!

cdb · January 6, 2023, 8:10pm

Hey Quant,
I am not totally sure but I assume that this 429 error is not a seafile error but a seatable error.
If you execute to many api calls there could also be a 429 returned by the SeaTable-API. You find the limits here:
https://manual.seatable.io/limitations/system_limitations/

Max. 300 calls per minute or 5.000 per day. The problem is the “base.append_row”. This appends every file with a single API-Call. To avoid the 429 the files have to be stored in an array and then added in one call.

Best regards
Christoph

Quant · January 7, 2023, 4:42pm

Hi cdb,
I think I found a bug here (Seatable 3.0.0 All Good vs Seatable 3.3.0 Report Error 429).
I checked my previous side Python script, let’s call it sync_local_library.py, very similar to this script, using the same method base.append_row(table_name, row_data) to update the files information from seafile library (local dir) to seatable.
When I run sync_local_library.py with Seatable 3.0.0 (Developer Version), all files (over 40,000) can be synced to Seatable, no error ever. Now I use Dev 3.3.0, and it report Error 429 when 1,807 has been synced.
Other information:

Api call limit has been set to a very large number in both versions of Seatable server.
I’ve run your script with both python runner and local PC, Python runner triggers Error 429 when the appended row number reaches 500 to 700, and when the script run on my local PC, it triggers Error 429 when the appended row number reaches approximately 1100.

Hope you can help to figure out the problem, thank you sincerly!

Quant · February 26, 2023, 2:50pm

I found the problem today.

I missed API rate settings in this file :dtable_server_config.json .
The adjusted file should be like:

And I adjusted some code to prevent random read errors from causing the code to stop running.

Part 3

def get_files(path):
    url = seafile_url + "/api2/repos/" + seafile_repo + "/dir/?p=/" + path
    headers = {
        "Authorization": "Token {}".format(token),
        "Accept": "application/json; indent=4",
    }
    resp = requests.get(url, headers=headers)
    # print("status code：", resp.status_code)
    n = 0
    for f in resp.json():
        # for debugging:
        # print(f)
        m = 0
        try:
            if f["type"] == "file":
                size = str(round(f["size"] / 1000000, 2)) + " MB"
                mtime = str(datetime.fromtimestamp(f["mtime"]))
                base.append_row(table_name, {'Size': size, 'Name': f['name'], 'Last Update': mtime, 'Path': path, 'Last Modifier': f['modifier_name']})
                n += 1
            else:
                get_files(path + "/" + f["name"])
        except Exception as e:
            m += 1
            print(e)
            # to see how many times the error occurs
            print(m)
            time.sleep(0.1)
            continue

Cheers.