
Mumbai, March 23 IIT researchers have developed an efficient method to track surgical instruments in 3D using standard 2D video and basic geometry.
Surgeons and patients increasingly choose laparoscopic surgery, also known as keyhole surgery, as patients experience less pain and faster recovery.
When surgeons manipulate robotic arms to guide a tiny tool inside the body in 3D space, they rely on experience and skill to perceive depth from the 2D video captured by the tiny camera at the operating site, which is also inserted through a keyhole.
While some high-end robotic surgery systems with 3D visualizations are available in tertiary healthcare facilities in major Indian cities, these are limited and expensive.
Dr. Shubhangi Nema and Prof. Leena Vachhani from the Indian Institute of Technology (IIT) Bombay, and Abhishek Mathur from Indian Institute of Technology Goa, have developed a novel software technique that enables 3D tracking of surgical instruments using standard video feeds, eliminating the need for expensive sensors and high-end computing.
This cost-effective approach, based on fundamental geometry, can enhance virtual reality training and has the potential to significantly lower the cost of 3D visualization systems in surgeries.
"We chose a geometric approach because geometry is fundamentally reliable and interpretable. We leveraged geometric cues such as perspective projection, instrument shape constraints, and interval-based uncertainty modelling (using a range of possible position coordinates instead of an exact position)," said Dr. Nema.
The researchers developed an algorithm that treats surgical tools as connected geometric shapes and tracks them using bounding boxes in 2D video.
By analyzing changes in size, shape, position, and angles – based on perspective rules – of the boxes, the system estimates the instrument's depth, movement, and rotation in 3D.
Accurately estimating depth from 2D images is challenging due to unclear object outlines caused by poor lighting, camera noise, or motion blur.
"From a single camera view, multiple 3D configurations can produce the same 2D projection. We introduced geometric constraints and interval-based bounds to narrow the feasible solution space," explained Dr. Nema.
Instead of saying that the tool tip is at an exact point P, the algorithm gives a range, or an interval, in which the tip can be present, she said.
"By incorporating known instrument dimensions and motion continuity, we reduced ambiguity. This approach makes 3D estimation more stable and robust," added Dr. Nema.
Furthermore, the study revealed that the system is efficient enough to run on a standard computer processor without specialized graphics hardware, processing video at speeds of approximately 50 frames per second, which is well within the requirements for real-time applications.
To validate their method, the team set up a physical experiment to record known motions of a scaled physical model using a highly precise motion capture system and a stationary webcam. The researchers found that the errors were negligible, making the method usable for labeling and motion tracking of instruments for futuristic applications.
The researchers plan to implement their strategy in an experimental setup to provide real-time training or assistance to the surgeons.
"This work demonstrates that a three-dimensional visual experience for surgeons can be achieved using the existing monocular laparoscopic camera itself, offering a cost-effective and practical pathway toward improved depth perception in minimally invasive surgery," added Prof. Vachhani.