Uploading data with DocumentUploader() creates a new pipeline


I am trying to upload a collection of documents using squirro_client. Whenever I do this, I am unable to pick a pipeline using ‘pipeline_workflow_id’. Instead, I see a new pipeline in the user-interface called ‘Item Uploader’). I use the following code:

uploader = DocumentUploader(project_title=project_title, token=TOKEN, cluster=CLUSTER)
    uploader.upload(filename=os.path.expanduser(path_to_pdf), mime_type='application/pdf',
                               title=document_title, doc_id=id_counter, created_at=CURRENT_TIME, Keywords=KEYWORDS, priority=1, pipeline_workflow_id=pipeline_id)

I am using Python 3.9. The squirro_client package is version 3.8.3.

1 Like

Hi TalitaAnthonio, Thanks for reaching out to us!

I just wanted to let you know that I was able to reproduce the issue.

I will update you once we have a solution.

All the best,

1 Like

Thank you :)!!!

Hi again TalitaAnthonio,

We looked into it and it seems that in order for the pipeline_workflow_id parameter to be respected, you must instantiate the Uploader class with that parameter like this:

uploader = DocumentUploader(project_id=PROJECT_ID, token=TOKEN, cluster=CLUSTER, pipeline_workflow_id=pipeline_id)

We are looking into updating the documentation to reflect this. Sorry for any confusion this may have caused.

This resolved the behaviour for us, let us know if you still experience it or if theres anything else we can assist you with.

Best luck!

Hi Aaron,

Thank you so much! This solved the problem!!

1 Like