Skip to content

pagerduty ¤

admin ¤

forms ¤

create_pagerduty_incident ¤

models ¤

PagerDutyEscalationPolicy ¤

Bases: Model

An Escalation Policy determines what User or Schedule will be Notified and in what order. This will happen when an Incident is triggered.

Escalation Policies can be used by one or more Services.

Escalation Rules

  • An Escalation Policy is made up of multiple Escalation Rules. Each Escalation Rule represents a level of On-Call duty.
  • It specifies one or more Users or Schedules to be notified when an unacknowledged Incident reaches that Escalation Rule.
  • The first Escalation Rule in the Escalation Policy is the User that will be notified about the triggered Incident.
  • If all On-Call User for a given Escalation Rule have been acknowledged of an Incident and the Escalation Rule's escalation delay has elapsed, the Incident escalates to the next Escalation Rule.

PagerDutyIncident ¤

Bases: Model

An Incident represents a problem or an issue that needs to be addressed and resolved.

Incidents can be thought of as a problem or an issue within your Service that needs to be addressed and resolved, they are normalized and de-duplicated.

Incidents can be triggered, acknowledged, or resolved, and are assigned to a User based on the Service's Escalation Policy.

A triggered Incident prompts a Notification to be sent to the current On-Call User(s) as defined in the Escalation Policy used by the Service.

Incidents are triggered through the Events API or are created by Integrations.

PagerDutyOncall ¤

Bases: Model

An On-Call represents a contiguous unit of time for which a User will be On-Call for a given Escalation Policy and Escalation Rule.

This may be the result of that User always being On-Call for the Escalation Rule, or a block of time during which the computed result of a Schedule on that Escalation Rule puts the User On-Call.

During an On-Call, the User is expected to bear responsibility for responding to any Notifications they receive and working to resolve the associated Incident(s).

On-Calls cannot be created directly through the API; they are the computed result of how Escalation Policies and Schedules are configured. The API provides read-only access to the On-Calls generated by PagerDuty.

PagerDutyOncallManager ¤

Bases: Manager['PagerDutyOncall']

get_current_oncalls_per_escalation_policy ¤

get_current_oncalls_per_escalation_policy() -> list[tuple[PagerDutyEscalationPolicy, list[PagerDutyOncall]]]

Returns the list of on call Users. These users have their PagerDutyUser associated.

Source code in src/firefighter/pagerduty/models.py
def get_current_oncalls_per_escalation_policy(
    self,
) -> list[tuple[PagerDutyEscalationPolicy, list[PagerDutyOncall]]]:
    """Returns the list of on call Users. These users have their PagerDutyUser associated."""
    oncalls = (
        self.select_related(
            "pagerduty_user",
            "schedule",
            "escalation_policy",
            "pagerduty_user",
            "pagerduty_user__user__slack_user",
        )
        .filter(
            models.Q(start__lte=timezone.now(), end__gte=timezone.now())
            | models.Q(end__isnull=True)
        )
        .exclude(escalation_policy__pagerdutyservice__ignore__exact=True)
        .order_by("escalation_level")
        .order_by(
            "escalation_policy__pagerdutyservice",
            "pagerduty_user",
        )
        .distinct("escalation_policy__pagerdutyservice", "pagerduty_user")
    )
    oncalls_grouped: list[
        tuple[PagerDutyEscalationPolicy, list[PagerDutyOncall]]
    ] = []
    for escalation_policy, oncalls_grouper in groupby(
        oncalls, key=lambda x: x.escalation_policy
    ):
        oncalls_list = sorted(oncalls_grouper, key=lambda x: x.escalation_level)
        oncalls_grouped.append((escalation_policy, oncalls_list))
    return oncalls_grouped

PagerDutySchedule ¤

Bases: Model

A Schedule determines the time periods that Users are On-Call.

Only On-Call Users are eligible to receive Notifications from firefighter.incidents.

The details of the On-Call Schedule specify which single User is On-Call for that Schedule at any given point in time.

An On-Call Schedule consists of one or more Schedule Layers that rotate a group of Users through the same shift at a set interval.

Schedules are used by Escalation Policies as an escalation target for a given Escalation Rule.

PagerDutyService ¤

Bases: Model

A Service represents an entity you monitor (such as a web Service, email Service, or database Service.) It is a container for related Incidents that associates them with Escalation Policies. A Service is the focal point for Incident management; Services specify the configuration for the behavior of Incidents triggered on them. This behavior includes specifying urgency and performing automated actions based on time of day, Incident duration, and other factors.

PagerDutyUserManager ¤

Bases: Manager['PagerDutyUser']

get_current_on_call_users_l1 staticmethod ¤

get_current_on_call_users_l1() -> list[User]

Returns the list of oncall first responders in each escalation policy. Only the lowest escalation level user is returned per Escalation Policy.

These users have their PagerDutyUser associated.

Source code in src/firefighter/pagerduty/models.py
@staticmethod
def get_current_on_call_users_l1() -> list[User]:
    """Returns the list of oncall first responders in each escalation policy. Only the lowest escalation level user is returned per Escalation Policy.

    These users have their PagerDutyUser associated.
    """
    ep_user = PagerDutyOncall.objects.get_current_oncalls_per_escalation_policy()
    users: list[User] = [oncall[1][0].pagerduty_user.user for oncall in ep_user]

    return users

upsert_by_pagerduty_id staticmethod ¤

upsert_by_pagerduty_id(pagerduty_id: str, email: str, phone_number: str, name: str) -> User | None

Returns a User from PagerDuty info. It will update a PagerDutyUser and its associated User if the name, phone, PagerDuty team changes.

Source code in src/firefighter/pagerduty/models.py
@staticmethod
def upsert_by_pagerduty_id(
    pagerduty_id: str,
    email: str,
    phone_number: str,
    name: str,
) -> User | None:
    """Returns a User from PagerDuty info.
    It will update a PagerDutyUser and its associated User if the name, phone, PagerDuty team changes.
    """
    # Get a user by its email and update the name. Create the user if necessary.
    ff_user, _ = User.objects.update_or_create(
        email=email, defaults={"name": name, "username": email.split("@")[0]}
    )

    # Update or create a PD User, with the key being its user. Update other fields.
    try:
        pd_user, _ = PagerDutyUser.objects.update_or_create(
            user=ff_user,
            defaults={
                "pagerduty_id": pagerduty_id,
                "phone_number": phone_number,
            },
        )

        if pd_user.user == ff_user:
            return ff_user
        logger.warning(
            "PD and FF users not matching. PDID=%s, email=%s",
            pagerduty_id,
            email,
        )

    except IntegrityError:
        logger.warning(
            "IntegrityError! Could not upsert PagerDuty User. PDID=%s, email=%s.",
            pagerduty_id,
            email,
            exc_info=True,
        )

        pd_user, _ = PagerDutyUser.objects.update_or_create(
            pagerduty_id=pagerduty_id,
            defaults={
                "user": ff_user,
                "phone_number": phone_number,
            },
        )
        return ff_user
    return None

service ¤

PagerdutyService ¤

PagerdutyService()

XXX Rename to PagerDutyClient to avoid confusion with PagerDutyService Django model.

Source code in src/firefighter/pagerduty/service.py
def __init__(self) -> None:
    self.client = PagerdutyClient()

signals ¤

get_invites_from_pagerduty ¤

incident_channel_done_oncall ¤

tasks ¤

PagerDuty Celery tasks.

fetch_oncall ¤

fetch_oncalls ¤

fetch_oncalls() -> None

Celery task to fetch PagerDuty oncalls and save them in the database. Will try to update services, users, schedules and escalation policies if needed.

Source code in src/firefighter/pagerduty/tasks/fetch_oncall.py
@celery_app.task(name="pagerduty.fetch_oncalls")
def fetch_oncalls() -> None:
    """Celery task to fetch PagerDuty oncalls and save them in the database.
    Will try to update services, users, schedules and escalation policies if needed.
    """
    services = pagerduty_service.get_all_oncalls()
    return create_oncalls(services)

fetch_services ¤

fetch_services ¤

fetch_services() -> None

Celery task to fetch PagerDuty services and save them in the database.

Source code in src/firefighter/pagerduty/tasks/fetch_services.py
@shared_task(name="pagerduty.fetch_services")
@transaction.atomic
def fetch_services() -> None:
    """Celery task to fetch PagerDuty services and save them in the database."""
    fetched_services_key = []
    for service in pagerduty_service.client.session.iter_all("services"):
        fetched_services_key.append(service["id"])
        PagerDutyService.objects.update_or_create(
            pagerduty_id=service["id"],
            defaults={
                "name": service["name"][:128],
                "status": service["status"][:128],
                "summary": service["summary"][:256],
                "web_url": service["html_url"][:256],
                "api_url": service["self"][:256],
            },
        )

    # Check that we don't have stale services
    if len(fetched_services_key) != PagerDutyService.objects.count():
        logger.warning("Stale PagerDuty Services found in DB. Manual action needed.")

fetch_users ¤

fetch_users ¤

fetch_users(*, delete_stale_user: bool = True) -> None

Celery task to fetch PagerDuty users and save them in the database.

Source code in src/firefighter/pagerduty/tasks/fetch_users.py
@shared_task(name="pagerduty.fetch_users")
def fetch_users(*, delete_stale_user: bool = True) -> None:
    """Celery task to fetch PagerDuty users and save them in the database."""
    fetched_users_id = []
    for user in pagerduty_service.client.get_all_users():
        logger.debug(user)
        fetched_users_id.append(user["id"])
        main_user = SlackUser.objects.upsert_by_email(user["email"])
        if main_user is None:
            logger.warning("Could not find user with email %s", user["email"])
            continue

        phone_number = pagerduty_service.get_phone_number_from_body(user)
        if phone_number is None:
            phone_number = ""

        pd_user, _ = PagerDutyUser.objects.update_or_create(
            pagerduty_id=user["id"],
            defaults={
                "name": user["name"],
                "user": main_user,
                "phone_number": phone_number,
                "pagerduty_url": user["html_url"][:256],
                "pagerduty_api_url": user["self"][:256],
            },
        )
        pd_teams = user.get("teams", [])
        pd_teams_models: list[PagerDutyTeam] = []
        for pd_team in pd_teams:
            pd_team_model, _ = PagerDutyTeam.objects.update_or_create(
                pagerduty_id=pd_team["id"],
                defaults={
                    "name": pd_team["summary"],
                    "pagerduty_url": pd_team["html_url"][:256],
                    "pagerduty_api_url": pd_team["self"][:256],
                },
            )
            pd_teams_models.append(pd_team_model)

        pd_user.teams.set(pd_teams_models)

    # Check that we don't have stale users
    if len(fetched_users_id) != PagerDutyUser.objects.count():
        stale_user_ids = PagerDutyUser.objects.exclude(
            pagerduty_id__in=fetched_users_id
        ).values_list("pagerduty_id", flat=True)
        logger.info(f"Stale Pagerduty users found {list(stale_user_ids)}.")

        if delete_stale_user:
            nb_deleted, _ = PagerDutyUser.objects.filter(
                pagerduty_id__in=stale_user_ids
            ).delete()
            logger.info(f"Deleted {nb_deleted} stale PagerDuty users.")

trigger_oncall ¤

trigger_oncall ¤

trigger_oncall(
    oncall_service: PagerDutyService,
    title: str,
    details: str,
    incident_key: str,
    conference_url: str,
    incident_id: int | None = None,
    triggered_by: User | None = None,
) -> PagerDutyIncident

Celery task to trigger an on-call in PagerDuty, from a FireFighter incident.

XXX Trigger from PD user if it exists, instead of admin. XXX Should be a service ID instead of a service object.

Source code in src/firefighter/pagerduty/tasks/trigger_oncall.py
@shared_task(name="pagerduty.trigger_oncall")
def trigger_oncall(
    oncall_service: PagerDutyService,
    title: str,
    details: str,
    incident_key: str,
    conference_url: str,
    incident_id: int | None = None,
    triggered_by: User | None = None,
) -> PagerDutyIncident:
    """Celery task to trigger an on-call in PagerDuty, from a FireFighter incident.

    XXX Trigger from PD user if it exists, instead of admin.
    XXX Should be a service ID instead of a service object.
    """
    service = oncall_service
    if incident_id:
        incident = Incident.objects.get(id=incident_id)
        details = f"""Triggered from {APP_DISPLAY_NAME} incident #{incident.id} {f'by {triggered_by.full_name}' if triggered_by else ''}
Priority: {incident.priority}
Environment: {incident.environment}
Component: {incident.component}
FireFighter page: {incident.status_page_url + '?utm_medium=FireFighter+PagerDuty&utm_source=PagerDuty+Incident&utm_campaign=OnCall+Message+In+Channel' }
Slack channel #{incident.slack_channel_name}: {incident.slack_channel_url}

Incident Details:
{details}
"""
    try:
        res = pagerduty_service.client.create_incident(
            title=title,
            pagerduty_id=service.pagerduty_id,
            details=details,
            incident_key=incident_key,
            conference_url=conference_url,
        )

    except PDHTTPError as e:
        if e.response.status_code == 404:
            logger.exception("User not found")
        else:
            logger.exception("Transient network error: %s", e.msg)
            raise
    except PDClientError as e:
        logger.exception("Non-transient network or client error: %s", e.msg)
    # TODO Error handling

    if not 200 <= res.status_code < 300:
        logger.error(
            {
                "message": "Error when calling PagerDuty API",
                "request": res.request.__dict__.get("body"),
                "response": res.json(),
            }
        )
        err_msg = f"Error when calling PagerDuty API. {res.json()}"
        raise ValueError(err_msg)

    pd_incident = res.json()["incident"]

    pd_incident_db, _ = PagerDutyIncident.objects.update_or_create(
        incident_key=pd_incident["incident_key"],
        defaults={
            "title": pd_incident["title"][:128],
            "urgency": pd_incident["urgency"][:128],
            "incident_number": pd_incident["incident_number"],
            "status": pd_incident["status"][:128],
            "service_id": service.id,
            "details": pd_incident["body"]["details"][:3000],  # get_in
            "summary": pd_incident["summary"][:256],  # get_in
            "web_url": pd_incident["html_url"][:256],
            "api_url": pd_incident["self"][:256],
            "incident_id": incident_id,
        },
    )

    return pd_incident_db

urls ¤

views ¤

oncall_list ¤

OncallListView ¤

Bases: ListView[PagerDutyOncall]

get_context_data ¤
get_context_data(**kwargs: Any) -> dict[str, Any]

No *args to pass.

Source code in src/firefighter/pagerduty/views/oncall_list.py
def get_context_data(self, **kwargs: Any) -> dict[str, Any]:
    """No *args to pass."""
    # Call the base implementation first to get a context
    context = super().get_context_data(**kwargs)
    last_fetched = (
        PagerDutyOncall.objects.values("updated_at").order_by("-updated_at").first()
    )
    context["last_updated"] = last_fetched["updated_at"] if last_fetched else None
    context["page_title"] = "On-call Overview"
    return context

oncall_trigger ¤