Testing code with Scitacean#

Testing programs that use Scitacean can be tricky as those tests might require access to a SciCat server or fileserver. Scitacean provides two way to help with this, tools for deploying servers on the local machine as well as fakes to perform tests without any actual servers. This guide describes both methods.

Firstly, faking is implemented by FakeClient and FakeFileTransfer . Those two classes follow the same separation of concerns as the real classes. That is FakeClient handles metadata and FakeFileTransfer handles files. They can be mixed and matched freely with the real client and file transfers. But it is generally recommended to combine them.

Secondly, SciCat servers and fileservers are managed by the scicat_backend and sftp_fileserver pytest fixtures.

First, create a test dataset and file.

[1]:
from scitacean import Dataset

dataset = Dataset(
    type="raw",
    owner_group="faculty",
    owner="ridcully",
    principal_investigator="Ridcully",
    contact_email="ridcully@uu.am",
    data_format="spellbook-9000",
    source_folder="/upload/abcd",
    creation_location="UnseenUniversity",
)
[2]:
from pathlib import Path

path = Path("test-data/spellbook.txt")
path.parent.mkdir(parents=True, exist_ok=True)
with path.open("w") as f:
    f.write("fireball power=1000 mana=123")
[3]:
dataset.add_local_files("test-data/spellbook.txt", base_path="test-data")

FakeClient#

scitacean.testing.client.FakeClient has the same interface as the regular Client but never connects to any SciCat server. Instead, it maintains an internal record of datasets and datablocks. It is easiest to explain with an example. First, create a FakeClient. The url is completely arbitrary and only needs to be passed for parity with the real client.

[4]:
from scitacean.testing.client import FakeClient
from scitacean.testing.transfer import FakeFileTransfer

client = FakeClient.without_login(
    url="https://fake.scicat",
    file_transfer=FakeFileTransfer())

Upload#

And now we can upload our test dataset as usual:

[5]:
finalized = client.upload_new_dataset_now(dataset)
str(finalized)
[5]:
"Dataset(type=raw, contact_email=ridcully@uu.am, created_at=2024-05-29 08:17:06.956150+00:00, created_by=fake, creation_location=UnseenUniversity, creation_time=2024-05-29 08:17:06.878800+00:00, data_format=spellbook-9000, owner=ridcully, owner_group=faculty, pid=PID.prefix.a0b1/15aa0e60-7f01-44a5-b934-6900293ac51c, principal_investigator=Ridcully, source_folder=RemotePath('/upload/abcd'), updated_at=2024-05-29 08:17:06.956150+00:00, updated_by=fake)"

However, this did not talk to a SciCat server. We can check if the fake upload was successful by inspecting the client. client.datasets is a dict that contains all datasets known to the fake server keyed by PID:

[6]:
client.datasets.keys()
[6]:
dict_keys([PID(prefix='PID.prefix.a0b1', pid='15aa0e60-7f01-44a5-b934-6900293ac51c')])
[7]:
pid = list(client.datasets.keys())[0]
client.datasets[pid]
[7]:
DownloadDataset(contactEmail='ridcully@uu.am', creationLocation='UnseenUniversity', creationTime=datetime.datetime(2024, 5, 29, 8, 17, 6, 878800, tzinfo=datetime.timezone.utc), inputDatasets=None, investigator=None, numberOfFilesArchived=0, owner='ridcully', ownerGroup='faculty', principalInvestigator='Ridcully', sourceFolder=RemotePath('/upload/abcd'), type=<DatasetType.RAW: 'raw'>, usedSoftware=None, accessGroups=None, version=None, classification=None, comment=None, createdAt=datetime.datetime(2024, 5, 29, 8, 17, 6, 956150, tzinfo=datetime.timezone.utc), createdBy='fake', dataFormat='spellbook-9000', dataQualityMetrics=None, description=None, endTime=None, instrumentGroup=None, instrumentId=None, isPublished=None, jobLogData=None, jobParameters=None, keywords=None, license=None, datasetlifecycle=None, scientificMetadata=None, datasetName=None, numberOfFiles=1, orcidOfOwner=None, ownerEmail=None, packedSize=0, pid=PID(prefix='PID.prefix.a0b1', pid='15aa0e60-7f01-44a5-b934-6900293ac51c'), proposalId=None, relationships=None, sampleId=None, sharedWith=None, size=28, sourceFolderHost=None, techniques=None, updatedAt=datetime.datetime(2024, 5, 29, 8, 17, 6, 956150, tzinfo=datetime.timezone.utc), updatedBy='fake', validationStatus=None)

The client has recorded the upload from earlier. However, it stored the dataset as a model, not as a regular Dataset object. In addition, since the dataset has a file, an original datablock was uploaded as well: (Datablocks store metadata and paths of files in SciCat.)

[8]:
client.orig_datablocks.keys()
[8]:
dict_keys([PID(prefix='PID.prefix.a0b1', pid='15aa0e60-7f01-44a5-b934-6900293ac51c')])
[9]:
# use the pid of the dataset
client.orig_datablocks[pid]
[9]:
[DownloadOrigDatablock(dataFileList=[DownloadDataFile(path='spellbook.txt', size=28, time=datetime.datetime(2024, 5, 29, 8, 17, 6, 882646, tzinfo=datetime.timezone.utc), chk='5cfe25239cedd2b6fdd72b50d40e2535d59329dcb3c576a36f8c8bfb90f78c1cbd1030ae8801bd8e6f130fa8b441562c2e04b52552260f378df5be24e8b16fee', gid=None, perm=None, uid=None)], datasetId=PID(prefix='PID.prefix.a0b1', pid='15aa0e60-7f01-44a5-b934-6900293ac51c'), size=28, id=None, accessGroups=None, chkAlg='blake2b', createdAt=datetime.datetime(2024, 5, 29, 8, 17, 6, 956578, tzinfo=datetime.timezone.utc), createdBy='fake', instrumentGroup=None, isPublished=None, ownerGroup=None, updatedAt=datetime.datetime(2024, 5, 29, 8, 17, 6, 956578, tzinfo=datetime.timezone.utc), updatedBy='fake')]

When writing tests, those recorded dataset and datablock models can be used to check if an upload worked.

Download#

FakeClient can also download datasets that are stored in its datasets dictionary:

[10]:
downloaded = client.get_dataset(pid)
str(downloaded)
[10]:
"Dataset(type=raw, contact_email=ridcully@uu.am, created_at=2024-05-29 08:17:06.956150+00:00, created_by=fake, creation_location=UnseenUniversity, creation_time=2024-05-29 08:17:06.878800+00:00, data_format=spellbook-9000, owner=ridcully, owner_group=faculty, pid=PID.prefix.a0b1/15aa0e60-7f01-44a5-b934-6900293ac51c, principal_investigator=Ridcully, source_folder=RemotePath('/upload/abcd'), updated_at=2024-05-29 08:17:06.956150+00:00, updated_by=fake)"

This is now an actual Dataset object like you would get from a real client.

If we want to test downloads independently of uploads, we can populate client.datasets and cliend.orig_datablocks manually. But keep in mind that those store models. See the model reference for an overview. And also note that orig_datablocks stores a list of models for each dataset as there can be multiple datablocks per dataset.

Fidelity#

Although FakeClient is sufficient for many tests, it does not behave exactly the same way as a real client. For example, it does not perform any validation of datasets or handle credentials. In addition, it does not modify uploaded datasets like a real server would. This can be seen from both the finalized dataset returned by client.upload_new_dataset_now(dataset) above and downloaded.

If a test requires these properties, consider using a locally deployed SciCat server. See in particular the developer documentation on testing.

FakeFileTransfer#

FakeClient used above only fakes a SciCat server, i.e. handling of metadata. If we also want to test file uploads and downloads, we can use scitacean.testing.transfer.FakeFileTransfer.

Starting from a clean slate, create a fake client with a fake file transfer as above:

[11]:
from scitacean.testing.client import FakeClient
from scitacean.testing.transfer import FakeFileTransfer

client = FakeClient.without_login(
    url="https://fake.scicat",
    file_transfer=FakeFileTransfer())

And upload a dataset:

[12]:
finalized = client.upload_new_dataset_now(dataset)

The file transfer has recorded the upload of the file without actually uploading it anywhere. We can inspect all files on the fake fileserver using:

[13]:
client.file_transfer.files
[13]:
{RemotePath('/upload/abcd/spellbook.txt'): b'fireball power=1000 mana=123'}

This is a dictionary keyed by remote_access_path to the content of the file.

We can also download the file.

[14]:
downloaded = client.get_dataset(finalized.pid)
with_downloaded_file = client.download_files(downloaded, target="test-data/download")
[15]:
file = list(with_downloaded_file.files)[0]
file
[15]:
File(local_path=PosixPath('test-data/download/spellbook.txt'), remote_path=RemotePath('spellbook.txt'), remote_gid=None, remote_perm=None, remote_uid=None, checksum_algorithm='blake2b')
[16]:
with file.local_path.open() as f:
    print(f.read())
fireball power=1000 mana=123

If we want to test downloads independently of uploads, we can populate client.file_transfer.files manually.

Local SciCat server#

scitacean.testing.backend provides tools to set up a SciCat backend and API in a Docker container on the local machine. It is primarily intended to be used via the pytest fixtures in scitacean.testing.backend.fixtures.

The fixtures can configure, spin up, and seed a SciCat server and database in Docker containers. They can furthermore provide easy access to the server by building clients. And they clean up after the test session by stopping the Docker containers.

Note the caveats in scitacean.testing.backend about clean up and use of pytest-xdist.

Set up#

First, ensure that Docker is installed and running on your machine. Then, configure pytest by

  • registering the fixtures and

  • adding a command line option to enable backend tests.

To this end, add the following in your conftest.py:

[17]:
import pytest
from scitacean.testing.backend import add_pytest_option as add_backend_option


pytest_plugins = (
    "scitacean.testing.backend.fixtures",
)

def pytest_addoption(parser: pytest.Parser) -> None:
    add_backend_option(parser)

The backend will only be launched when the corresponding command line option is given. By default, this is --backend-tests but it can be changed via the option argument of add_pytest_option.

Use SciCat in tests#

Tests that require the server can now request it as a fixture:

[18]:
def test_something_with_scicat(require_scicat_backend):
    # test something
    ...

The require_scicat_backend fixture will ensure that the backend is running during the test. If backend tests have not been enabled by the command line option, the test will be skipped.

The simplest way to connect to the server is to request the client or real_client fixture:

[19]:
def test_something_with_scicat_client(client):
    # test something
    ...

The client fixture provides both a client connected to the SciCat server and a fake client. (Both without a file transfer). The test will run two times, once with each client if backend tests are enabled. If they are disabled, the test will only run with a fake client.

If your test does not work with a fake client, you can request real_client instead of client to only get the real client. Make sure to also request require_scicat_backend in this case to skip the test if backend tests are disabled. Or skip them explicitly:

[20]:
def test_something_with_real_client(real_client):
    if real_client is None:
        pytest.skip("Backend tests disabled")
        # or do something else

    # do the actual tests

Seed data#

The database used by the local SciCat server is seeded with a number of datasets from scitacean.testing.backend.seed. These datasets are accessible via both real and fake clients.

To access the seed, use for example:

[21]:
from scitacean.testing.backend import seed

def test_download_raw(client):
    dset = seed.INITIAL_DATASETS["raw"]
    downloaded = client.get_dataset(dset.pid)
    assert downloaded.owner == dset.owner

Both clients, i.e., also the fake client, require that the database has been seeded, even when backend tests are disabled. You can ensure this by requesting either scicat_backend or require_scicat_backend along fake_client in your test. To write a test that uses only a fake client but with seed, use

[22]:
def test_seeded_fake(fake_client, scicat_backend):
    dset = seed.INITIAL_DATASETS["raw"]
    downloaded = fake_client.get_dataset(dset.pid)
    assert downloaded.owner == dset.owner

This will run the test both when backend tests are enabled and disabled. In the latter case, the server is never launched and fake_client is seeded in a different way. This different way of seeding corresponds to how scitacean.testing.client.FakeClient processes uploaded files. So it may not be entirely the same as with a real backend. See in particular the Fidelity section

Local SFTP fileserver#

scitacean.testing.sftp provides tools to set up an SFTP server in a Docker container on the local machine. It is primarily intended to be used via the pytest fixtures in scitacean.testing.sftp.fixtures.

The fixtures can configure, spin up, and seed an SFTP server in a Docker container. They also clean up after the test session by stopping the Docker container. (Scritly speaking, the server is an SSH server but all users except root are restricted to SFTP.)

Note the caveats in scitacean.testing.sftp about clean up and use of pytest-xdist.

Set up#

First, ensure that Docker is installed and running on your machine. Then, configure pytest by

  • registering the fixtures and

  • adding a command line option to enable sftp tests.

To this end, add the following in your conftest.py: (Or merge it into the setup for backend tests from above.)

[23]:
import pytest
from scitacean.testing.sftp import add_pytest_option as add_sftp_option


pytest_plugins = (
    "scitacean.testing.sftp.fixtures",
)

def pytest_addoption(parser: pytest.Parser) -> None:
    add_sftp_option(parser)

The SFTP server will only be launched when the corresponding command line option is given. By default, this is --sftp-tests but it can be changed via the option argument of add_pytest_option.

Use SFTP in tests#

Tests that require the server can now request it as a fixture:

[24]:
def test_something_with_sftp(require_sftp_fileserver):
    # test something
    ...

The require_sftp_fileserver fixture will ensure that the SFTP server is running during the test. If SFTP tests have not been enabled by the command line option, the test will be skipped.

Connecting to the server is not as straight forward as for the SciCat backend. It requires passing a special connect function to the file transfer. This can be done by requesting sftp_connect_with_username_password. For example, the following opens a connection to the server to upload a file:

[25]:
from scitacean.transfer.sftp import SFTPFileTransfer

def test_sftp_upload(
    sftp_access,
    sftp_connect_with_username_password,
    require_sftp_fileserver,
    sftp_data_dir,
):
    sftp = SFTPFileTransfer(host=sftp_access.host,
                            port=sftp_access.port,
                            connect=sftp_connect_with_username_password)
    ds = Dataset(...)
    with sftp.connect_for_upload(dataset=ds) as connection:
        # do upload
        ...
    # assert that the file has been copied to sftp_data_dir
    ...

Uploaded files are readable on the host. So the test can read from sftp_data_dir to check if the upload succeeded. This directory is mounted as /data on the server.

Using an SFTP file transfer with Client requires some extra steps. An example is given by test_client_with_sftp in SciCatProject/scitacean. It uses a subclass of SFTPFileTransfer to pass sftp_connect_with_username_password to the connection as Client cannot do this itself.

Seed data#

The server’s filesystem gets seeded with some files from SciCatProject/scitacean. Those files are copied to sftp_data_dir on the host which is mounted to /data/seed on the server.