Components

The four constituent parts of a Protocol, and by extension of any Deidentifier.

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()


Boundary(protocol, "Protocol / Deidentifier", "Processes dataset") {
    Container(tags, "Tags", "Transforms DICOM\n metadata") #AA98E2
    Container(filters, "Filters", "Optionally rejects\n dataset") #AA98E2
    Container(pixel, "Pixel", "Processes\n image data") #AA98E2
    Container(private, "Private", "Processes \n private tags") #AA98E2
}

tags -[hidden]r- filters
filters -[hidden]r- pixel
pixel -[hidden]r- private

@enduml

Tags, Filters, Pixel and Private together define the complete handling of any incoming DICOM Dataset.

Tags

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()


Boundary(protocol, "Protocol / Deidentifier") {
    Container(tags, "Tags", "Transforms DICOM\n metadata") #AA98E2
    Container(filters, "Filters") #A8A8A8
    Container(pixel, "Pixel") #A8A8A8
    Container(private, "Private") #A8A8A8
}

tags -[hidden]r- filters
filters -[hidden]r- pixel
pixel -[hidden]r- private

@enduml

Tags Processing. Processes all DICOM elements in a DICOM Dataset except for PixelData and Private tags (see ‘excluded elements’ below). Tags has a different form for protocols and deidentifiers:

  • For protocols, tags defines what should be done with each DICOM Tag. The language for this is Action Codes. Per tag, a protocol defines whether to clean, remove, keep etc. each value.

  • For deidentifiers, tags defines what is actually done to each tag. For each tag, a deidentifier implements a procedure that maps to one of the action codes. For example, an operation clean might be implemented as writing a dummy value, or obtaining a pseudonym from a secure source, or writing an aggregated value.

Excluded elements

Two types of DICOM elements are specifically excluded from tags prococessing:

  1. Image data elements are excluded, as their structure requires specialized treatment. They have their own dedicated component Pixel.

  2. Similarly, all private tags are excluded from tags processing and handled in Private.

Syntax

How to define tags processing

For protocols, tags is easily defined by a list of tag -> action code.

For deidentifiers, the definition is more involved. It should be a list of tag -> Implemented function with an additional mapping and explanation of how each implemented function maps to an action code.

Filters

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()


Boundary(protocol, "Protocol / Deidentifier") {
    Container(tags, "Tags") #A8A8A8
    Container(filters, "Filters", "Optionally rejects\n dataset") #AA98E2
    Container(pixel, "Pixel" ) #A8A8A8
    Container(private, "Private",) #A8A8A8
}

tags -[hidden]r- filters
filters -[hidden]r- pixel
pixel -[hidden]r- private

@enduml

Checks any dataset and either accepts it for further processing or rejects it. Common reasons for rejection are unknown DICOM with burnt in information, non-conformant DICOM or unknown SOPClass.

Filters can be applied at multiple times in a deidentification process. Particularly, it can reject outright from the start, but can also be called after Pixel is called, as Pixel can change the tag ‘PatientIdentityRemoved’ which is a potential input to Filters.

The Filters component is solely responsible for rejecting datasets. Not other component can do this.

Syntax

A filter is defined in the form of a boolean or propositional truth function. For example:

<Modality == "MR"> and <Manufacturer contains "Company A"> -> Reject
  • Relationships between propositions are purely standard logical connectives and or not and parenthesis ( ) for grouping.

  • Each proposition in the formula is a boolean function over a tag value. The test performed inside a preposition can be of any form, as long as the outcome is boolean (yes/no).

  • The outcome the formula is always Reject yes/no

For a deidentifier, Filters will be implemented to be actually runnable. For a protocol, Filters can be written down in any formal language that implements boolean logic.

Pixel

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()

Boundary(protocol, "Protocol / Deidentifier") {
    Container(tags, "Tags") #A8A8A8
    Container(filters, "Filters", ) #A8A8A8
    Container(pixel, "Pixel", "Processes\n image data") #AA98E2
    Container(private, "Private" ) #A8A8A8
}

tags -[hidden]r- filters
filters -[hidden]r- pixel
pixel -[hidden]r- private

@enduml

Processes all Image data elements. So that PHI is removed from the images. This includes burnt-in text, implant serial numbers and faces.

The tag PatientIdentityRemoved can be set by Pixel and not touched by Tags processing.

Syntax

The protocol Pixel processing definition differs for the two types of pixel-based PHI Burnt-in image PHI and Dynamic image PHI.

For burnt in PHI

For Burnt-in image PHI, pixel is processing is defined like a boolean function using only the tags from the Image Type ID subspace followed by one or more square pixel regions to black out. For example:

<Modality == "MR"> and <Manufacturer contains "Company A"> ->
[0,0,512,30], [0,400,512,30]

The format for a black-out region is [top, left, size-x, size-y] where top and left are the pixel coordinates of the top left of the region, counting from the top left of the image (top left of the image = (0,0)), and size-x and size-y are the size of the box in pixels.

Note

In the future, pixel data processing will probably move to OCR-type techniques where text is recognized in any image regardless of its ‘type’. This will make the currently described approach unneeded. Any list of type -> black out region can then still be useful for testing purposes.

For dynamic image PHI

For Dynamic image PHI, there is no set method or syntax. A protocol should document whether any dynamic image PHI should be removed. This should be a human-readable description. There is no set format for this.

For a deidentifier the description should include a description of the methods used, if any. The evidence should make it

Private

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()


Boundary(protocol, "Protocol / Deidentifier") {
    Container(tags, "Tags") #A8A8A8
    Container(filters, "Filters", ) #A8A8A8
    Container(pixel, "Pixel" ) #A8A8A8
    Container(private, "Private", "Processes \n private tags") #AA98E2
}

tags -[hidden]r- filters
filters -[hidden]r- pixel
pixel -[hidden]r- private

@enduml

Private tag handling is boils down to maintaining a list of ‘safe private’ tags. The DICOM standard allows indicating whether a deidentification method retains safe private tags (option ‘Rtn. Safe Priv. Opt’ in table E.1-1). The standard does not define which private tags are considered safe. Several lists are maintained by several organizations.

Syntax

If a protocol retains safe private tags, these are defined as a list of private tags deemed safe. For example:

0013,["Company_A"]01
0013,["Company_A"]02

0075,["Company_B"]01
0075,["Company_B"]0e
0075,["Company_B"]31

Looking at the first example 0013,["Company_A"]01 in detail:

  • 0013 is the element group number

  • Company_A is the value of the private creator tag

  • 01 is the last part of the element number (first part is dynamically set by private creator tag)

See Private tag for more information on private tag structure.