Spaces and codes

For several MIDOM objects, it is useful to describe the space they inhabit. Inside this space, all possible objects of a certain type are represented. By defining the space, it becomes easier to reason about relationships and operations.

Dataset Space

All possible DICOM datasets. Each DICOM tag is a dimension, the possible values of the tag are the value of the dimension. Each unique dataset is a unique point in dataset space.

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()

Boundary(space, "Dataset Space", "All possible DICOM datasets") {
    Container(ds1, "Ds1")
    Container(ds2, "Ds2")
    Container(ds3, "Ds3")
    Container(dsX, "etc..") #A8A8A8
}

ds1 -[hidden]r- ds2
ds2 -[hidden]r- ds3
dsX -[hidden]l- ds3

Container(ds1_detail, "DataSet 1", "", "| tag1 | 'A' |\n| tag2 | 10 |\n etc...") #6E97BE
ds1 -- ds1_detail

Container(ds2_detail, "DataSet 2", "", | tag1 | 'A' |\n| tag2 | 11 |\n etc...) #6E97BE
ds2 -- ds2_detail

Container(ds3_detail, "DataSet 3", "", | tag1 | 'B' |\n| tag2 | 8 |\n etc...) #6E97BE
ds3 -- ds3_detail


@enduml

Dataset Space contains all possible values for all non-private DICOM tags. Each point is a unique data set.

Private tags are not part of dataset space.

Size

This space has one dimension for each DICOM Tag, which comes to slightly over 4000. The size of each dimension is bounded by each tag’s Value Representation (VR). Some value representations, like ‘Other Byte String’ are potentially infinite. Tags like PixelData, WaveformData, SpectroscopyData and EncapsulatedDocument are all potentially infinite in size, only constrained by practical implementation.

The number of permutations is so large as to be practically infinite.

Delta Space

All possible delta sets.

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()



Boundary(space, "Delta Space", "All possible delta sets") {
    Container(ds1, "Ds1") #65B27D
    Container(ds2, "Ds2") #65B27D
    Container(ds3, "Ds3") #65B27D
    Container(dsX, "etc..") #A8A8A8
}

ds1 -[hidden]r- ds2
ds2 -[hidden]r- ds3
dsX -[hidden]l- ds3

Container(ds1_detail, "DeltaSet 1", "", "| tag1 | UNCHANGED |\n| tag2 | CHANGED |\n etc...") #96B7A0
ds1 -- ds1_detail

Container(ds2_detail, "DeltaSet 2", "", "| tag1 | UNCHANGED |\n| tag2 | REMOVED |\n etc...") #96B7A0
ds2 -- ds2_detail

Container(ds3_detail, "DeltaSet 3", "", "| tag1 | EMPTIED |\n| tag2 | REMOVED |\n etc...") #96B7A0
ds3 -- ds3_detail


@enduml

Delta Space contains all possible perceived changes to all non-private DICOM tags. Each point is a unique delta set.

Size

Delta space is smaller than Dataset Space. Most importantly it is not infinite. It compresses all variability down to 5 delta_codes. Which means the total number of delta sets is \(5^n\) with \(n\) the number of possible elements. For 4000+ elements this is still astronomical but bounded nonetheless. For the 611-dimensional E1-1 subspace the size is roughly \(e^{427}\).

Action Space

All possible action sets.

@startuml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml
!includeurl  https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

'# remove <<system>> above each block
HIDE_STEREOTYPE()



Boundary(space, "Action Space", "All possible action sets") {
    Container(ds1, "AS1") #C09C59
    Container(ds2, "AS2") #C09C59
    Container(ds3, "AS3") #C09C59
    Container(dsX, "etc..") #A8A8A8
}

ds1 -[hidden]r- ds2
ds2 -[hidden]r- ds3
dsX -[hidden]l- ds3

Container(ds1_detail, "ActionSet 1", "", "| tag1 | CLEAN |\n| tag2 | REMOVE |\n etc...") #BEAC8D
ds1 -- ds1_detail

Container(ds2_detail, "ActionSet 2", "", "| tag1 | CLEAN |\n| tag2 | UID |\n etc...") #BEAC8D
ds2 -- ds2_detail

Container(ds3_detail, "ActionSet 3", "", "| tag1 | REMOVE |\n| tag2 | EMPTY |\n etc...") #BEAC8D
ds3 -- ds3_detail


@enduml

Action Space contains all possible changes all non-private DICOM tags. Each point is a unique action set.

Subspaces

The space created by a subset of all dicom tags.

Image Type ID subspace

Sixteen tags that contain information about the system that produced this DICOM image. Mainly used for mapping PHI regions. In code it looks like this

class ImageTypeIDSubspace:
    """Tags used to determine what type of DICOM image this is.
    Used to match burnt-in information locations.
    """

    tags = [
        "BurnedInAnnotation",
        "CodeMeaning",
        "Columns",
        "CommentsOnRadiationDose",
        "ConvolutionKernel",
        "ImageComments",
        "ImageType",
        "InstanceNumber",
        "Modality",
        "Manufacturer",
        "ManufacturerModelName",
        "ProtocolName",
        "Rows",
        "SeriesDescription",
        "StationName",
        "SoftwareVersions",
    ]

E1-1 subspace

All non-private tags mentioned in DICOM PS3.15 table E.1-1 If a tag is in this list, it means the official DICOM deidentification rules have something to say about how to handle that tag. These 640 tags are the only ones to have an action code associated with them.

Code (python) for the E1-1 subspace is in github

This subspace can be applied to both Delta Space and Dataset Space.

Action Codes

The codes used in the DICOM standard Attribute and Confidentiality Profiles to denote per DICOM Tag the action that should be taken to deidentify it.

Main action codes:

Code

Description

D

Replace with dummy

Z

Replace with empty or dummy

X

Remove

K

Keep

C

Clean

U

Replace with UID

There are 5 more action codes that are combinations of the 6 main ones above. All 11 action codes are listed in PS3.15 table E.1-1a.

Action codes are used to describe what should happen to a tag. In contrast, delta codes are used to describe an observable change in a tag value.

Action Set

A set of Action Codes. All possible codes are contained in Action Space. Example action set:

DICOM Tag Name

Action

Patient’s Name

REMOVE

Study Date

KEEP

Series Instance UID

UID

Accession Number

EMPTY

Referring Physician’s Name

REMOVE

Modality

KEEP

When characterizing the Tags component of a Protocol or Deidentifier, an action set is often called an action profile

Delta Codes

A Delta Code or describes an observable change in a DICOM element, typically before and after processing

Change Code

Description

UNCHANGED

No modification

CHANGED

Modified

REMOVED

Deleted

EMPTIED

Cleared/Set to empty

CREATED

Newly added

Note that multiple action codes can cause the same change code to be observed.