rosette package¶
Submodules¶
rosette.api module¶
Python client for the Babel Street Analytics API.
Copyright (c) 2014-2024 Basis Technology Corporation.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
- class rosette.api.API(user_key=None, service_url='https://analytics.babelstreet.com/rest/v1/', retries=5, refresh_duration=0.5, debug=False)¶
Bases:
object
Analytics Python Client Binding API; representation of an Analytics server. Call instance methods upon this object to obtain L{EndpointCaller} objects which can communicate with particular Analytics server endpoints.
- address_similarity(parameters)¶
Create an L{EndpointCaller} to perform address similarity scoring and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the name matcher. @type parameters: L{AddressSimilarityParameters} @return: A python dictionary containing the results of name matching.
- categories(parameters)¶
Create an L{EndpointCaller} to identify the category of the text to which it is applied and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the category identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of categorization.
- clear_custom_headers()¶
Clears custom headers
- clear_options()¶
Clears all options
- clear_url_parameters()¶
Clears all options
- entities(parameters)¶
Create an L{EndpointCaller} to identify named entities found in the texts to which it is applied and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the entity identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of entity extraction.
- events(parameters)¶
Create an L{EndpointCaller} to identify events found in the texts. @param parameters: An object specifying the data, and possible metadata, to be processed by the ‘events’ identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of event extraction.
- get_binding_version()¶
Return the current binding version
- get_custom_headers()¶
Get custom headers
- get_http(url, headers)¶
Simple wrapper for the GET request
@param url: endpoint URL @param headers: request headers
- get_option(name)¶
Gets an option
@param name: name of option
@return: value of option
- get_pool_size()¶
Returns the maximum pool size, which is the returned x-rosetteapi-concurrency value
- get_url_parameter(name)¶
Gets a URL parameter
@param name: name of parameter
@return: value of parameter
- get_user_agent_string()¶
Return the User-Agent string
- info()¶
Create a ping L{EndpointCaller} for the server and ping it. @return: A python dictionary including the ping message of the L{API}
- language(parameters)¶
Create an L{EndpointCaller} for language identification and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the language identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of language identification.
- matched_name(parameters)¶
deprecated Call name_similarity to perform name matching. @param parameters: An object specifying the data, and possible metadata, to be processed by the name matcher. @type parameters: L{NameSimilarityParameters} @return: A python dictionary containing the results of name matching.
- morphology(parameters, facet='')¶
Create an L{EndpointCaller} to returns a specific facet of the morphological analyses of texts to which it is applied and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the morphology analyzer. @type parameters: L{DocumentParameters} or L{str} @param facet: The facet desired, to be returned by the created L{EndpointCaller}. @type facet: An element of L{MorphologyOutput}. @return: A python dictionary containing the results of morphological analysis.
- name_deduplication(parameters)¶
Fuzzy de-duplication of a list of names @param parameters: An object specifying a list of names as well as a threshold @type parameters: L{NameDeduplicationParameters} @return: A python dictionary containing the results of de-duplication
- name_similarity(parameters)¶
Create an L{EndpointCaller} to perform name similarity scoring and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the name matcher. @type parameters: L{NameSimilarityParameters} @return: A python dictionary containing the results of name matching.
- name_translation(parameters)¶
Create an L{EndpointCaller} to perform name analysis and translation upon the name to which it is applied and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the name translator. @type parameters: L{NameTranslationParameters} @return: A python dictionary containing the results of name translation.
- ping()¶
Create a ping L{EndpointCaller} for the server and ping it. @return: A python dictionary including the ping message of the L{API}
- post_http(url, data, headers)¶
Simple wrapper for the POST request
@param url: endpoint URL @param data: request data @param headers: request headers
- record_similarity(parameters)¶
Create an L{EndpointCaller} to get similarity core between a list of records and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the record matcher. @type parameters: L{RecordSimilarityParameters} @return: A python dictionary containing the results of record matching.
- relationships(parameters)¶
Create an L{EndpointCaller} to identify the relationships between entities in the text to which it is applied and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the relationships identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of relationship extraction.
- semantic_vectors(parameters)¶
Create an L{EndpointCaller} to identify text vectors found in the texts to which it is applied and call it. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of semantic vectors.
- sentences(parameters)¶
Create an L{EndpointCaller} to break a text into sentences and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the sentence identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of sentence identification.
- sentiment(parameters)¶
Create an L{EndpointCaller} to identify the sentiment of the text to which it is applied and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the sentiment identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of sentiment identification.
- set_custom_headers(name, value)¶
Sets custom headers
@param headers: array of custom headers to be set
- set_option(name, value)¶
Sets an option
@param name: name of option @param value: value of option
- set_pool_size(new_pool_size)¶
Sets the connection pool size. @parameter new_pool_size: pool size to set
- set_url_parameter(name, value)¶
Sets a URL parameter
@param name: name of parameter @param value: value of parameter
- similar_terms(parameters)¶
Create an L{EndpointCaller} to identify terms most similar to the input in the requested languages :param parameters: DocumentParameters :return: A python dictionary containing the similar terms and their similarity
- syntax_dependencies(parameters)¶
Create an L{EndpointCaller} to identify the syntactic dependencies in the texts to which it is applied and call it. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of syntactic dependencies identification
- text_embedding(parameters)¶
deprecated Create an L{EndpointCaller} to identify text vectors found in the texts to which it is applied and call it. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of text embedding.
- tokens(parameters)¶
Create an L{EndpointCaller} to break a text into tokens and call it. @param parameters: An object specifying the data, and possible metadata, to be processed by the tokens identifier. @type parameters: L{DocumentParameters} or L{str} @return: A python dictionary containing the results of tokenization.
- topics(parameters)¶
Topics returns keyphrases and concepts related to the provided content @type parameters: DocumentParameters @return; A python dictionary containing the results
- translated_name(parameters)¶
deprecated Call name_translation to perform name analysis and translation upon the name to which it is applied. @param parameters: An object specifying the data, and possible metadata, to be processed by the name translator. @type parameters: L{NameTranslationParameters} @return: A python dictionary containing the results of name translation.
- transliteration(parameters)¶
Transliterate given context @type parameters: L{DocumentParameters} @return: A python dictionary containing the results of the transliteration
- class rosette.api.AddressSimilarityParameters¶
Bases:
_RequestParametersBase
Parameter object for C{address-similarity} endpoint.
C{address1} and C{address2} are required.
parameters is optional.
C{address1} The address to be matched, a C{address} object or address string.
C{address2} The address to be matched, a C{address} object or address string.
- The C{address} object contains these optional fields:
city, island, district, stateDistrict, state, countryRegion, country, worldRegion, postCode, poBox
parameters is a dictionary listing any parameter overrides to include. For example, postCodeAddressFieldWeight. Setting parameters is not cumulative. Define all overrides at once. If defined multiple times, only the final declaration is used.
See examples/address_similarity.py
- validate()¶
Internal. Do not use.
- class rosette.api.DocumentParameters¶
Bases:
_RequestParametersBase
Parameter object for all operations requiring input other than translated_name. Two fields, C{content} and C{inputUri}, are set via the subscript operator, e.g., C{params[“content”]}, or the convenience instance methods L{DocumentParameters.load_document_file} and L{DocumentParameters.load_document_string}.
Using subscripts instead of instance variables facilitates diagnosis.
If the field C{contentUri} is set to the URL of a web page (only protocols C{http, https, ftp, ftps} are accepted), the server will fetch the content from that web page. In this case, C{content} may not be set.
- load_document_file(path)¶
Loads a file into the object. The file will be read as bytes; the appropriate conversion will be determined by the server. @parameter path: Pathname of a file acceptable to the C{open} function.
- load_document_string(content_as_string)¶
Loads a string into the object. The string will be taken as bytes or as Unicode dependent upon its native python type. @parameter s: A string, possibly a unicode-string, to be loaded for subsequent analysis.
- serialize(options)¶
Internal. Do not use.
- validate()¶
Internal. Do not use.
- class rosette.api.EndpointCaller(api, suburl)¶
Bases:
object
L{EndpointCaller} objects are invoked via their instance methods to obtain results from the Analytics server described by the L{API} object from which they are created. Each L{EndpointCaller} object communicates with a specific endpoint of the Analytics server, specified at its creation. Use the specific instance methods of the L{API} object to create L{EndpointCaller} objects bound to corresponding endpoints.
Use L{EndpointCaller.ping} to ping, and L{EndpointCaller.info} to retrieve server info. For all other types of requests, use L{EndpointCaller.call}, which accepts an argument specifying the data to be processed and certain metadata.
The results of all operations are returned as python dictionaries, whose keys and values correspond exactly to those of the corresponding JSON return value described in the Analytics web service documentation.
- call(parameters, paramtype=None)¶
Invokes the endpoint to which this L{EndpointCaller} is bound. Passes data and metadata specified by C{parameters} to the server endpoint to which this L{EndpointCaller} object is bound. For all endpoints except C{name-translation} and C{name-similarity}, it must be a L{DocumentParameters} object or a string; for C{name-translation}, it must be an L{NameTranslationParameters} object; for C{name-similarity}, it must be an L{NameSimilarityParameters} object. For relationships, it may be an L(DocumentParameters).
In all cases, the result is returned as a python dictionary conforming to the JSON object described in the endpoint’s entry in the Analytics web service documentation.
@param parameters: An object specifying the data, and possible metadata, to be processed by the endpoint. See the details for those object types. @type parameters: Parameters types or L{str} for document request. @param paramtype: Required parameters type. @return: A python dictionary expressing the result of the invocation.
- info()¶
Issues an “info” request to the L{EndpointCaller}’s specific endpoint. @return: A dictionary telling server version and other identifying data.
- ping()¶
Issues a “ping” request to the L{EndpointCaller}’s (server-wide) endpoint. @return: A dictionary if OK. If the server cannot be reached, or is not the right server or some other error occurs, it will be signalled.
- class rosette.api.NameDeduplicationParameters¶
Bases:
_RequestParametersBase
Parameter object for C{name-deduplication} endpoint. Required: C{names} A list of C{name} objects C{threshold} Threshold to use to restrict cluster size. Can be null to use default value.
- validate()¶
Internal. Do not use.
- class rosette.api.NameSimilarityParameters¶
Bases:
_RequestParametersBase
Parameter object for C{name-similarity} endpoint.
C{name1} and C{name2} are required.
parameters is optional.
C{name1} The name to be matched, a C{name} object.
C{name2} The name to be matched, a C{name} object.
The C{name} object contains these fields:
C{text} Text of the name, required.
C{language} Language of the name in ISO639 three-letter code, optional.
C{script} The ISO15924 code of the name, optional.
C{entityType} The entity type, can be “PERSON”, “LOCATION” or “ORGANIZATION”, optional.
parameters is a dictionary listing any parameter overrides to include. For example, deletionScore. Setting parameters is not cumulative. Define all overrides at once. If defined multiple times, only the final declaration is used.
See examples/name_similarity.py
- validate()¶
Internal. Do not use.
- class rosette.api.NameTranslationParameters¶
Bases:
_RequestParametersBase
Parameter object for C{name-translation} endpoint. The following values may be set by the indexing (i.e.,C{ parms[“name”]}) operator. The values are all strings (when not C{None}). All are optional except C{name} and C{targetLanguage}. Scripts are in ISO15924 codes, and languages in ISO639 (two- or three-letter) codes. See the Name Translation documentation for more description of these terms, as well as the content of the return result.
C{name} The name to be translated.
C{targetLangauge} The language into which the name is to be translated.
C{entityType} The entity type of the name. PERSON (default), LOCATION, or ORGANIZATION
C{sourceLanguageOfOrigin} The language of origin of the name.
C{sourceLanguageOfUse} The language of use of the name.
C{sourceScript} The script in which the name is supplied.
C{targetScript} The script into which the name should be translated.
C{targetScheme} The transliteration scheme by which the translated name should be rendered.
- validate()¶
Internal. Do not use.
- class rosette.api.RecordSimilarityParameters¶
Bases:
_RequestParametersBase
Parameter object for C{record-similarity} endpoint. Required: C{records} The records to be compared; where each left record is compared to the associated right record. C{properties} Parameters used in the call C{fields} The definition of the fields used in the comparison. There must be a minimum of 1 field and can have a maximum of 5 fields.
- validate()¶
Internal. Do not use.
- exception rosette.api.RosetteException(status, message, response_message)¶
Bases:
Exception
Exception thrown by all Analytics API operations for errors local and remote.
TBD. Right now, the only valid operation is conversion to __str__.
Module contents¶
Python client for the Babel Street Analytics API. Copyright (c) 2014-2024 Basis Technology Corporation. Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.