Sources ๐Ÿ”—

A source is a file that can be ingested by Crunch, either to define a dataset or to add data to it (or both). When used in an import operation, the file becomes the source for part or all of a dataset. Sources is not a general-purpose file-store; it is intended only for files used for Crunch imports.

An uploaded file is registered as a source for a particular dataset when it is used in an import operation. Once a source has been โ€œregisteredโ€ as part of an import, it cannot be deleted unless the target dataset has been deleted.

Each source belongs to a specific user. A sources catalog response will contain only those sources belonging to the requesting user and a newly added source will be assigned to the requesting user. Sources are unordered.

Access to sources is limited to users with create_datasets permissions.

Catalog ๐Ÿ”—


A Shoji Catalog representing the Sources added by this User.

GET ๐Ÿ”—

A GET request on the catalog endpoint returns a shoji:catalog element containing all sources belonging to the requesting user.

    "description": "List of data sources added by this user",
    "element": "shoji:catalog",
    "index": {
        "": {
            "name": "uploaded.txt",
            "description": "",
        "": {
            "name": "export-3AF276CC.csv",
            "description": "Export of prior survey",
    "self": "",

Each source-entity reference in the catalog index include the name and description of the source.


A POST request on the Sources Catalog endpoint adds a new source for the requesting user.

Three alternative payloads are supported.

Upload file as part of multipart/form-data POST request ๐Ÿ”—

POST a multipart form with an uploaded_file field containing the file to upload. A 201 (Created) response indicates success and returns the URL of the new source in its Location header.

The filename field provided for each file is used for the source name. A source added in this way receives the empty string (โ€œโ€) as its description.

While a multipart form request can contain multiple files, only the file contained in the uploaded_file field will be used to create a source; each source requires a separate POST request.

The Content-Type value (MIME-type) provided for the file determines the parser used for that file. Care should be taken to, for example, distinguish a CSV file (text/csv) from a plain text file (text/plain), etc.

POST /sources/ HTTP/1.1
Content-Length: 8874357
Content-Type: multipart/form-data; boundary=df5b17ff463a4cb3aa61cf02224c7303

Content-Disposition: form-data; name="uploaded_file"; filename="my.csv"
Content-Type: text/csv

201 Created
Location: /sources/{source_id}/

POST a shoji:entity with a file URL as location ๐Ÿ”—

A source can also be created by POSTing a shoji:entity that references the file to be used as a URL.

  "element": "shoji:entity",
  "body": {
    "location": "<url>",
    "name": "Optional name",
    "description": "Optional description"

The description attribute is optional, but will appear as the description attribute on the created source.

POST a form (urlencoded or multipart/form) with a source_url field ๐Ÿ”—

Alternately, you may POST a form with a source_url field that points to a publicly accessible URL. Both the โ€œhttpโ€ and the โ€œs3โ€ scheme are supported. This endpoint will download that file synchronously and verify it as a valid source file.

Entity ๐Ÿ”—


A Shoji Entity representing a single Source. Its โ€œbodyโ€ member contains:

  • name: A friendly name for the Source.
  • type: a string declaring the media type of the source. One of (โ€œcsvโ€, โ€œspssโ€).
  • user_id: the id of the User who created the Source.
  • location: an absolute URI to the data. Currently, the only supported scheme is โ€œcrunchfile://โ€, which indicates a file uploaded to
  • settings: an object containing configuration for translating the source to crunch internals. Its members vary by type:
  • csv:
    • strict: an integer. If 1, extra columns or undefined category ids in the CSV will raise an error. If 0, they will be added to the dataset.

A PUT must contain a JSON object with members from the Shoji Entity โ€œbodyโ€ which the client intends to update. 204 indicates success.

A DELETE destroys the Source resource. 204 indicates success.


A GET returns the original source file.