Advancements in surgical technology, devices, changing concepts, and technical expertise have allowed the propagation of minimally invasive surgery (MIS) in the hepatobiliary and pancreatic (HBP) field, although such advancements are relatively late compared with those of other subdivisions of general surgery due to complex and technically demanding features of HBP surgeries [1]. For moving from open surgery to the MIS era, there is one crucial point: video is a by-product of surgeries using laparoscopic or robotic cameras. These videos allow us to assess surgical procedures. They also provide us tools for workflow analysis, performance benchmarking, and education [2]. Following this trend, some societies have attempted to use video-based assessments for certification [3]. However, current video-based analysis is labor-intensive and time-consuming because it requires manual analysis.
Artificial intelligence (AI) has been extensively used to compensate for this limitation. Currently, AI is used in several medical fields such as pathology, radiology, and internal medicine to support diagnosis [4-6]. To follow this trend, AI technology has been introduced in general surgery [7,8]. Tasks such as quality assessment, education, and surgical performance evaluation have been attempted using AI.
However, only a few such studies have been published in the field of general surgery. One notable hurdle regarding this issue is the relatively large amount of raw data available for analysis. Unlike analyzing several images on pathomics, radiomics, or short-time videos of medical procedures, analyzing surgical videos depends on the surgical procedure that varies from dozens of minutes to several hours, which requires many resources. Consequently, much attention has been focused on automating the surgical workflow analysis of surgical videos for effective evaluation [2,9,10].
Laparoscopic cholecystectomy is one of the most widely accepted laparoscopic surgeries since the 20th century. It is widely performed at various hospital levels [11]. It requires a relatively short time, usually within an hour. Owing to these features, laparoscopic cholecystectomy is considered suitable for starting AI modeling for the first time. Moreover, in cases of acute cholecystitis, severe inflammation or fibrosis can increase the difficulty of laparoscopic cholecystectomy and result in adverse events such as bleeding and bile spillage. In this situation, an AI model can be used to evaluate the accuracy of various unwanted situations with prolonged surgical times [9].
This study aimed to design a deep learning model to automatically identify surgical phases using laparoscopic cholecystectomy videos and automatically assess the accuracy of recognizing surgical phases.
A total of 80 cholecystectomy videos from the public Cholec80 dataset and 50 laparoscopic cholecystectomy videos from a single institution were collected. A dataset comprising 130 cholecystectomy videos was constructed [12]. These videos were recorded between July 2022 and December 2022. Operation videos had a resolution of at least 720 × 480 and a frame rate of 25 frames per second. All videos were anonymous. They included full-length images from trocar insertion to trocar removal. After excluding videos that could not be annotated because of poor quality, the remaining 120 videos were analyzed. These datasets were split to training and testing datasets for the AI model at a 2:1 ratio.
All datasets were annotated according to the following seven phases: (1) preparation, (2) Calot’s triangle dissection, (3) clipping and cutting, (4) gallbladder dissection, (5) gallbladder packaging, (6) cleaning and coagulation, and (7) gallbladder retraction (Table 1). In addition, frames were marked in the case of a bleeding event using a previously trained model [13]. Videos were annotated using the Hutom Video Annotation Tool (HVAT, Hutom Corp.) by three HBP surgeons with 12, 10, and 8 years of experience, respectively.
Table 1 . Definitions of various surgical phases of laparoscopic cholecystectomy
Phase | Definition | |
---|---|---|
1 | Preparation | Time period from the start of the surgery until the first laparoscopic tool appears |
2 | Calot’s triangle dissection | Time period from when the first laparoscopic tool appears until the first clip applier appears |
3 | Clipping and cutting | Time period from when the first clip applier appears until the cystic artery and duct is finished (the end point is until the next tool touches the organization) |
4 | Gallbladder dissection | Time period from when the tool touches tissue until the specimen bag appears |
5 | Gallbladder packaging | Time period from when the specimen bag appears until it is closed (or until a suction tip or other aid tool appears) |
6 | Cleaning and coagulation | Time period from when the specimen bag is closed until the tool touches the bag strap (or until the suction tip is withdrawn or the trocar appears on the screen) |
7 | Gallbladder retraction | Time period from the first touch of the bag strap until the specimen bag is withdrawn |
3D-ResNet models were trained using annotations from laparoscopic cholecystectomy videos [14]. Test scenarios were constructed according to structural characteristics of the trained model. No pre- or post-processing of the input data or inference output was performed to accurately analyze the effect of the label on model training. However, in the training and inference of each 3D-ResNet model, basic data augmentation was performed as part of regularization at the input stage of the architecture. All models were implemented using PyTorch v1.10.0 and trained on NVIDIA V100 [15].
The loss function for the final cross-entropy output in the recognition of all surgical phases is shown as follows:
where M is the size of the input minibatch, C is the total number of surgical phases of each surgery,
3D-ResNet extends the residual learning structure designed for efficient training of convolution neural networks (CNNs) into three dimensions by simultaneously considering time-level information [14,16]. We utilized a 34-layer 3D-ResNet pre-trained using the Kinetics dataset to efficiently train spatiotemporal information regarding the surgical phase in the surgery video [17]. The input clip comprised 16 frames at 1 f/s. Stochastic gradient descent was used as the optimizer with an initial learning of 0.001, a momentum of 0.9, and a weight decay of 5.0 × 10−5. The learning scheduler was applied to reduce the learning rate at the plateau. The minibatch trained a total of 200 epochs at 128. The spatial size of the input sequence was resized to 256 × 256 and finally processed to a training input through random cropping to a size of 224 × 224 (Appendix 1).
The inference scenario for obtaining the testing performance received a sequence of the same size as that of the training process as the input and output of the final confidence value for each surgical phase. We used sequence information of 16 frames as the inference input for the 3D-ResNet.
The inference scenario waited until the first 16 frames were stored in the input buffer to perform the inference. Subsequently, in the form of a cue, the first and last frames of the input sequence were replaced to perform the inference. Finally, the current surgical phase was determined as the class with the maximum confidence value. Augmentation techniques other than input image size resizing were not applied, considering the real-time scenario [18].
Based on accuracy metrics, our model was evaluated using the test set. The formula used to calculate each metric is shown as follows:
Where TP is true positive, FP is false positive, and FN is false negative.
IBM SPSS Statistics version 25.0 (IBM Corp.) was used for all statistical analyses. Nominal data are described as frequencies and percentages. Parametric data are described as mean and standard deviation and non-parametric data are described as median and interquartile range. This study was approved by our Institutional Review Board (IRB) (IRB No. 4-2023-0212). The requirement for written informed consent was waived by the IRB due to its retrospective study design.
The mean surgery time of the enrolled 40 cases of laparoscopic cholecystectomy video in the analysis set was 1,855 ± 1,067 seconds. Each surgical phase demonstrated high variance in duration, with the Calot’s triangle dissection phase being the longest (909 [569–1,129] frames) (Fig. 2). After reviewing enrolled videos, seven (12.5%) cases were found to show bleeding and bile spillage events during cholecystectomy. In addition, one (2.5%) patient had acute chronic cholecystitis.
A total of 98,234 frames were extracted from the 40 cases as test data. The overall accuracy of the model was 91.2%. The most accurate phase was Calot’s triangle dissection (F1 score: 0.9421), whereas the least accurate phase was clipping and cutting (F1 score: 0.7761) (Table 2, Fig. 3). All test sets are shown in Supplementary Fig. 1. No significant correlation was detected between operation time and overall accuracy (r = −0.265, p = 0.098). In subgroup analysis, the overall accuracy of the 13 (32.5%) videos that detected surgical events such as bleeding, bile spillage, and acute status was 92.7% (F1 scores: 0.9232, 0.9372, and 0.9114, respectively). In addition, these events did not affect the operation time (odds ratio: 1.000 [0.999–1.001], p = 0.745).
Table 2 . Recognized results of the test set
Phase | Total frame | Precision | Recall | F1 score |
---|---|---|---|---|
Preparation | 4,816 | 0.9650 | 0.8756 | 0.9181 |
Calot’s triangle dissection | 37,940 | 0.9349 | 0.9493 | 0.9421 |
Clipping and cutting | 6,751 | 0.7642 | 0.7882 | 0.7761 |
Gallbladder dissection | 34,314 | 0.9364 | 0.9137 | 0.9249 |
Gallbladder packaging | 3,902 | 0.7887 | 0.9052 | 0.8430 |
Cleaning and coagulation | 7,113 | 0.8722 | 0.8950 | 0.8834 |
Gallbladder retraction | 3,398 | 0.8944 | 0.8205 | 0.8559 |
Overall | 98,234 | 0.9121 | 0.9121 | 0.9121 |
According to a study by Bar et al. [19], a 3D-CNN-LSTM-based model achieved an accuracy of 91.7% on 311 test videos. Czempiel et al. [20] have achieved an accuracy of 88.6% on the Cholec80 dataset using a multi-stage temporal convolution network (MS-TCN) based approach. In another study by Czempiel et al. [21], an accuracy of 91.3% was achieved on the Cholec80 dataset by applying temporal refinement to the multi-head attention mechanism of a transformer. Yi et al. [22] have improved the combination of refinements in a TCN architecture to achieve an accuracy of 92.8% on the Cholec80 dataset. Similarly, our 3D-CNN-based model demonstrated comparable accuracy to recently proposed state-of-the-art models [19-22].
This study presents an AI model to automate surgical phase recognition in a laparoscopic cholecystectomy dataset containing 120 annotated videos. This model successfully detected surgical phases with an overall accuracy of more than 90%. The accuracy of our model was similar to those of previously reported models [2,9,10,19-22].
Complications after laparoscopic cholecystectomy are rare, occurring in approximately 0.2% of cases [23]. However, surgical events such as bleeding and bile spillage, which are not regarded as complications, are common [24,25]. Although such events are often neglected in the clinical field because they usually have little effect on the clinical course, they are significant in annotation modeling in the technical field because they affect the AI model’s performance. Previous studies have reported approximately 30% of surgical events during the procedure [9,10]. A similar result (32.5%) was observed in this study. In addition, their accuracies were not inferior to those of other cases, similar to previous reports [9,10].
After reviewing individual data that exhibited low accuracy, F1 scores of two (5.0%) cases were under 0.7. There were no specific events during laparoscopic cholecystectomy. These two cases revealed problems that could not distinguish between phases of Calot’s triangle dissection and gallbladder dissection. Accuracies of these phases were less than 60%. Two phases accounted for 70% of entire frames. Consequently, their F1 scores were low.
This result is a limitation of the proposed model. Theoretically, this model recognizes laparoscopic cholecystectomy videos as a sum of each frame. However, it does not account for the chronological order. Under ground truth conditions, surgeons can easily discriminate between the Calot’s triangle dissection and gallbladder dissection phases. Surgeons are aware that the Calot’s triangle dissection phase is preceded by the gallbladder dissection phase and that there is a clipping and cutting phase between the two phases. However, without this logic, discriminating between the two phases using only individual frames is difficult even for surgeons. In some models, this problem was attempted to be resolved to fix the chronological order of operations [9].
However, accurate discrimination of the clipping and cutting phase should be proceeded to achieve the task. This is also difficult because of technical issues. Previous studies also showed relatively low accuracy levels (approximately 70%). They did not show separate phases [2,9]. In our study, the accuracy of the clipping and cutting phase was 78.8%. This rate was higher than that reported previously [2,9].
Our model introduces a tool-recognition model to improve the accuracy of the phase [26]. Aiding the model, specific devices used in the phase, such as endo clips or scissors, were detected. They facilitated phase recognition. Despite advances in phase recognition, the relatively low accuracy of this phase is a problem that needs to be solved. Under ground truth conditions, surgeons can also distinguish this phase of the anatomical structure from a critical view of safety [27]. Introducing this concept helps improve the accuracy in particular phases.
Although there is still something to complement, video- based AI analysis could reduce intensive labor time and provide support in the surgical field. Video-based coaching is a feasible modality for surgical education. This is helpful for decision-making and individualizing feedback for trainees [28]. Adopting an AI model could help surgeons quickly review videos. It is expected to significantly improve the efficiency of video-based coaching. Moreover, an AI model can aid in the assessment of surgical performance. Although a review of the entire video has the advantage of analyzing the complete procedure, edited videos are used for practical reasons in most cases [29]. However, edited videos may be biased compared with unedited videos. In this situation, using the AI model could help reviewers assess surgical performance, edit critical components of the surgery, and assign them to reviewers.
In addition, archiving more detailed parameters could help enhance clinical outcomes. For surgeons, intraoperative adverse events during surgery are the most important factors affecting patient outcomes [30]. However, it is difficult to measure these events quantitatively due to their complexity and relatively simplified parameters, such as intraoperative hypotension, bleeding, and operation time [31-33]. In addition, these parameters show limitations in describing and predicting clinical outcomes due to broad descriptions enrolling theentire surgical procedures. For example, consuming time in adhesiolysis differs from consuming time in clot dissection phases. In addition, there is a difference between oozing bleeding during surgery and bleeding caused by major vessel injury. However, the current definition cannot distinguish these two events. Aiding phase recognition using our model, detailed and sub-classified event detection is possible. Furthermore, based on the data, novel benchmark parameters could be suggested for the intraoperative phase [34]. To achievethe goal, further studies are mandatory for more accurate detection, including structure recognition [35].
To the best of our knowledge, this is the first study in Korea to define annotations for laparoscopic cholecystectomy using AI. In addition, our AI model demonstrated advanced performance in specific phases to compensate for previously known limitations. However, this study has several limitations. First, only two datasets were used to train and test the AI models (a publicly available dataset [Cholec80] and our institution). Various surgeons’ videos in our institution were included to compensate for this limitation. A multicenter study is mandatory for the next step because surgical devices and techniques are relatively similar. Second, this study was conducted using data from a relatively small sample of more than 100 cases. However, the sample size of this study could satisfy the minimum requirement, as some studies have reported that performance plateaus could be achieved with around 100 cases of cholecystectomy [19]. Further study using a large sample size is essential. Third, our model was non-real-time. In addition, the system did not assist with surgery. Real-time systems are a promising development direction. It provides real-time guidance and mitigates the risk of adverse events [36]. However, reviewing operational videos using a non-real-time system has unique features. Beyond the education and evaluation of surgical performance, retrospectively analyzed operation videos can be used as objective performance indicators to predict the immediate postoperative course and aid postoperative management [37]. Future direction would be toward real-time and non-real-time systems to develop the concept of complementary interactions.
In conclusion, our AI model can identify different phases of laparoscopic cholecystectomy with a high accuracy. Further studies using various supplementary methods are needed to improve the accuracy of this method.
Supplementary data related to this article can be found at https://doi.org/10.14701/ahbps.24-091.
ahbps-28-4-466-supple.pdfThe authors received financial support from The Korean Association of Hepato Biliary Pancreatic Surgery (KAHPBS-22-04).
This work was supported by the Korea Medical Device Development Fund grant funded by the Korea government(the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, the Ministry of Health & Welfare, the Ministry of Food and Drug Safety) (Project Number: 1711197872, RS-2023-00254563).
No potential conflict of interest relevant to this article was reported.
Conceptualization: SHK, MKC. Data curation: SSH, DHH, GHC, YY. Methodology: JY, SHK, MKC. Visualization: BP, MKC. Writing - original draft: SHK, HYY. Writing - review & editing: SSH, MKC.