Data Load error - Job ID Hashing

Hello,
I am getting an error for my data loader python file based on Job ID: It says “TypeError: Unicode-objects must be encoded before hashing”, does anyone have experience with this error? I am using standard template code for this function:

    def getJobId(self):
        """
        Return a unique string for each different select
        :returns a string
        """
        # Generate a stable id that changes with the main parameters
        # m = hashlib.sha256()
        m = hashlib.blake2b(digest_size=20)
        m.update(repr(os.getcwd()))
        job_id = m.hexdigest()
        log.debug("Job ID: %s", job_id)
        return job_id

ERROR:

Traceback (most recent call last):
  File "/Users/manutej.mulaveesala/python_virtual_environments/squirro/lib/python3.7/site-packages/squirro/dataloader/sq_data_load.py", line 693, in main
    launcher.execute()
  File "/Users/manutej.mulaveesala/python_virtual_environments/squirro/lib/python3.7/site-packages/squirro/dataloader/sq_data_load.py", line 472, in execute
    self.config.source, cli_mode=cli_mode, max_inc_value=max_inc_value
  File "/Users/manutej.mulaveesala/python_virtual_environments/squirro/lib/python3.7/site-packages/squirro/dataloader/sq_data_load.py", line 148, in load_from_source
    job_id = self._get_job_id(source, source_name)
  File "/Users/manutej.mulaveesala/python_virtual_environments/squirro/lib/python3.7/site-packages/squirro/dataloader/sq_data_load.py", line 337, in _get_job_id
    job_id = source.getJobId()
  File "nytimes_dataloader.py", line 179, in getJobId
    m.update(repr(os.getcwd()))
TypeError: Unicode-objects must be encoded before hashing
2 Likes

I was able to figure out the issue. The reason was some of the code was outdated python2 code and needed to be adjusted. The key change being that the query needs to be encoded as utf-8 explicitly rather than implicitly as was previously possible with python2.

See below:

    def getJobId(self):
        """
        Return a unique string for each different select
        :returns a string
        """
        # Generate a stable id that changes with the main parameters

        m = hashlib.sha256()
        m.update(self.args.query[0].encode("utf-8"))
        job_id = m.hexdigest()
        log.debug("Job ID: %s", job_id)
        return job_id
2 Likes

Where did you get this template code from?

The template I’m aware of at training/custom_plugin.py at 0422e9eaa291c63063712d286ff5addd965b8546 · squirro/training · GitHub has the right handling, which is to use repr and encode in combination to get the right input to the hashing function.

Your revised code does not necessarily handle the arguments correctly. It seems that args.query is a multi-value option, but only the first value will be used for the job id.

You can also find additional information on this page: https://squirro.atlassian.net/wiki/spaces/DOC/pages/2425979318/Writing+a+custom+one-click+connector#getJobId

2 Likes

Yes I saw that Patrice. Did not realize that was the standard code that should always be in there.
What does Blake2b do?

I used another version because I didn’t realize this piece shouldn’t be customized. Good to know thanks

The first custom param and second custom param Is where I would include my arguments?