scitacean.transfer.sftp.SFTPFileTransfer#

class scitacean.transfer.sftp.SFTPFileTransfer(*, host, port=22, source_folder=None, connect=None)[source]#

Upload / download files using SFTP.

Configuration & Authentication#

The file transfer connects to the server at the address given as the host constructor argument. This may be

  • a full url such as some.fileserver.edu,

  • or an IP address like 127.0.0.1.

The file transfer can currently only authenticate through an SSH agent. The agent must be set up for the chosen host and hold a valid key. If this is not the case, it is possible to inject a custom connect function that authenticates in a different way. See the examples below.

Upload folder#

The file transfer can take an optional source_folder as a constructor argument. If it is given, SFTPFileTransfer uploads all files to it and ignores the source folder set in the dataset. If it is not given, SFTPFileTransfer uses the dataset’s source folder.

The source folder argument to SFTPFileTransfer may be a Python format string. In that case, all format fields are replaced by the corresponding fields of the dataset. All non-ASCII characters and most special ASCII characters are replaced. This should avoid broken paths from essentially random contents in datasets.

Examples

Given

dset = Dataset(
    type="raw",
    name="my-dataset",
    source_folder="/dataset/source",
)

This uploads to /dataset/source:

file_transfer = SFTPFileTransfer(host="fileserver")

This uploads to /transfer/folder:

file_transfer = SFTPFileTransfer(host="fileserver",
                                 source_folder="transfer/folder")

This uploads to /transfer/my-dataset: (Note that {name} is replaced by dset.name.)

file_transfer = SFTPFileTransfer(host="fileserver",
                                source_folder="transfer/{name}")

A useful approach is to include a unique ID in the source folder, for example, "/some/base/folder/{uid}", to avoid clashes between different datasets. Scitacean will fill in the "{uid}" placeholder with a new UUID4.

The connection and authentication method can be customized using the connect argument. For example, to use a specific username + SSH key file, use the following:

def connect(host, port):
    from paramiko import SSHClient

    client = SSHClient()
    client.load_system_host_keys()
    client.connect(
        hostname=host,
        port=port,
        username="<username>",
        key_filename="<key-file-name>",
    )
    return client.open_sftp()

file_transfer = SFTPFileTransfer(host="fileserver", connect=connect)

The paramiko.client.SSHClient can be configured as needed in this function.

Constructors

__init__(*, host[, port, source_folder, connect])

Construct a new SFTP file transfer.

Methods

connect_for_download()

Create a connection for downloads, use as a context manager.

connect_for_upload(dataset)

Create a connection for uploads, use as a context manager.

source_folder_for(dataset)

Return the source folder used for the given dataset.

__init__(*, host, port=22, source_folder=None, connect=None)[source]#

Construct a new SFTP file transfer.

Parameters:
  • host (str) – URL or name of the server to connect to.

  • port (int, default: 22) – Port of the server.

  • source_folder (str | RemotePath | None, default: None) – Upload files to this folder if set. Otherwise, upload to the dataset’s source_folder. Ignored when downloading files.

  • connect (Callable[[str, int | None], SFTPClient] | None, default: None) – If this argument is set, it will be called to create a client for the server instead of the builtin method. The function arguments are host and port as determined by the arguments to __init__ shown above.

connect_for_download()[source]#

Create a connection for downloads, use as a context manager.

Return type:

Iterator[SFTPDownloadConnection]

connect_for_upload(dataset)[source]#

Create a connection for uploads, use as a context manager.

Parameters:

dataset (Dataset) – The connection will be used to upload files of this dataset. Used to determine the target folder.

Return type:

Iterator[SFTPUploadConnection]

source_folder_for(dataset)[source]#

Return the source folder used for the given dataset.

Return type:

RemotePath