Skip to content

Report

If you're using pub analyzer as an external library, you'll find pretty much everything you need here. This is where all the magic happens. 🪄

Suppose you already have an author model. Let's see how to generate a scientific production report of that author.

import asyncio

from pub_analyzer.internal.report import make_author_report
from pub_analyzer.models.author import Author

author = Author(**kwargs) # (1)!
report = asyncio.run(make_author_report(author=author)) # (2)!
  1. Use real information instead of **kwargs placeholder.
  2. Functions are defined as asynchronous since their primary use occurs within a TUI context. We apologize for any inconvenience this may cause.

And that's it! that's all. Well, maybe you want to export the report to a format like JSON, that's where pydantic does its magic.

with open("report.json", mode="w", encoding="utf-8") as file:
    file.write(report.model_dump_json(indent=2, by_alias=True)) # (1)!
  1. It is important that you use the by_alias=True parameter, otherwise you will not be able to import correctly using the pub analyzer models.

✨ ta-da!

Early stages

In the early phases of the project, before Pub Analyzer existed as a TUI, the main goal was to emulate an Excel file. This file, based on input tables containing the works of an author and the works that reference them, categorized the types of citations. Later, the idea was expanded to encompass automating works retrieval. It was during this period that I stumbled across OpenAlex, and as they say, one thing led to another.

Functions to make reports.

FromDate module-attribute

FromDate = NewType('FromDate', datetime)

DateTime marker for works published from this date.

ToDate module-attribute

ToDate = NewType('ToDate', datetime)

DateTime marker for works published up to this date.

_add_work_abstract

def _add_work_abstract(
    work: dict[str, Any],
) -> dict[str, Any]

Get work abstract from abstract_inverted_index and insert new key abstract.

Parameters:

Name Type Description Default
work dict[str, Any]

Raw work.

required

Returns:

Type Description
dict[str, Any]

Work with new key abstract.

_get_author_profiles_keys

def _get_author_profiles_keys(
    author: Author,
    extra_profiles: list[
        Author | AuthorResult | DehydratedAuthor
    ]
    | None,
) -> list[AuthorOpenAlexKey]

Create a list of profiles IDs joining main author profile and extra author profiles.

Parameters:

Name Type Description Default
author Author

Main OpenAlex author object.

required
extra_profiles list[Author | AuthorResult | DehydratedAuthor] | None

Extra OpenAlex authors objects related with the main author.

required

Returns:

Type Description
list[AuthorOpenAlexKey]

List of Author OpenAlex Keys.

_get_authors_list

def _get_authors_list(
    authorships: list[Authorship],
) -> list[str]

Collect OpenAlex IDs from authors in a list of authorship's.

Parameters:

Name Type Description Default
authorships list[Authorship]

List of authorships.

required

Returns:

Type Description
list[str]

Authors keys IDs.

_get_citation_type

def _get_citation_type(
    original_work_authors: list[str],
    cited_work_authors: list[str],
) -> CitationType

Compare two lists of authors and returns the citation type.

Based on the authors of a given work and the authors of another work that cites the analyzed work, calculate the citation type.

Parameters:

Name Type Description Default
original_work_authors list[str]

List of the authors of the evaluated work.

required
cited_work_authors list[str]

List of the authors of the citing document.

required

Returns:

Type Description
CitationType

Calculated cite type (Type A or Type B).

Info

Type A: Citations made by researchers in documents where the evaluated author or one of his co-authors does not appear as part of the authorship of the citing documents.

Type B: Citations generated by the author or one of the co-authors of the work being analyzed.

_get_institution_keys

def _get_institution_keys(
    institution: Institution,
    extra_profiles: list[
        Institution
        | InstitutionResult
        | DehydratedInstitution
    ]
    | None,
) -> list[InstitutionOpenAlexKey]

Create a list of profiles IDs joining main institution profile and extra institution profiles.

Parameters:

Name Type Description Default
institution Institution

Main OpenAlex institution object.

required
extra_profiles list[Institution | InstitutionResult | DehydratedInstitution] | None

Extra OpenAlex institutions objects related with the main institution.

required

Returns:

Type Description
list[InstitutionOpenAlexKey]

List of Institution OpenAlex Keys.

_get_source async

def _get_source(client: AsyncClient, url: str) -> Source

Get source given a URL.

Parameters:

Name Type Description Default
client AsyncClient

HTTPX asynchronous client to be used to make the requests.

required
url str

URL of works with all filters.

required

Returns:

Type Description
Source

Source Model.

Raises:

Type Description
HTTPStatusError

One response from OpenAlex API had an error HTTP status of 4xx or 5xx.

_get_valid_works

def _get_valid_works(
    works: list[dict[str, Any]],
) -> list[dict[str, Any]]

Skip works that do not contain enough data.

Parameters:

Name Type Description Default
works list[dict[str, Any]]

List of raw works.

required

Returns:

Type Description
list[dict[str, Any]]

List of raw works with enough data to pass the Works validation.

Danger

Sometimes OpenAlex provides works with insufficient information to be considered. In response, we have chosen to exclude such works at this stage, thus avoiding the need to handle exceptions within the Model validators.

_get_works async

def _get_works(client: AsyncClient, url: str) -> list[Work]

Get all works given a URL.

Iterate over all pages of the URL

Parameters:

Name Type Description Default
client AsyncClient

HTTPX asynchronous client to be used to make the requests.

required
url str

URL of works with all filters and sorting applied.

required

Returns:

Type Description
list[Work]

List of Works Models.

Raises:

Type Description
HTTPStatusError

One response from OpenAlex API had an error HTTP status of 4xx or 5xx.

make_author_report async

def make_author_report(
    author: Author,
    extra_profiles: list[
        Author | AuthorResult | DehydratedAuthor
    ]
    | None = None,
    pub_from_date: FromDate | None = None,
    pub_to_date: ToDate | None = None,
    cited_from_date: FromDate | None = None,
    cited_to_date: ToDate | None = None,
) -> AuthorReport

Make a scientific production report by Author.

Parameters:

Name Type Description Default
author Author

Author to whom the report is generated.

required
extra_profiles list[Author | AuthorResult | DehydratedAuthor] | None

List of author profiles whose works will be attached.

None
pub_from_date FromDate | None

Filter works published from this date.

None
pub_to_date ToDate | None

Filter works published up to this date.

None
cited_from_date FromDate | None

Filter works that cite the author, published after this date.

None
cited_to_date ToDate | None

Filter works that cite the author, published up to this date.

None

Returns:

Type Description
AuthorReport

Author's scientific production report Model.

Raises:

Type Description
HTTPStatusError

One response from OpenAlex API had an error HTTP status of 4xx or 5xx.

make_institution_report async

def make_institution_report(
    institution: Institution,
    extra_profiles: list[
        Institution
        | InstitutionResult
        | DehydratedInstitution
    ]
    | None = None,
    pub_from_date: FromDate | None = None,
    pub_to_date: ToDate | None = None,
    cited_from_date: FromDate | None = None,
    cited_to_date: ToDate | None = None,
) -> InstitutionReport

Make a scientific production report by Institution.

Parameters:

Name Type Description Default
institution Institution

Institution to which the report is generated.

required
extra_profiles list[Institution | InstitutionResult | DehydratedInstitution] | None

List of institutions profiles whose works will be attached.

None
pub_from_date FromDate | None

Filter works published from this date.

None
pub_to_date ToDate | None

Filter works published up to this date.

None
cited_from_date FromDate | None

Filter works that cite the institution, published after this date.

None
cited_to_date ToDate | None

Filter works that cite the institution, published up to this date.

None

Returns:

Type Description
InstitutionReport

Institution's scientific production report Model.

Raises:

Type Description
HTTPStatusError

One response from OpenAlex API had an error HTTP status of 4xx or 5xx.