Chennai: Home services startup Pronto’s admission last week that it was piloting in-home video recordings to train physical AI systems shines a light on a fast-growing and loosely regulated industry of AI data capture and labelling for the global robotics supply chain.
Pronto is not alone. Startups such as Human Archive, Humyn Labs, Egolab AI and Neocambrian are collecting what is called egocentric data or first-person video captured through wearables or head-mounted cameras. They partner with cloud kitchens, hotels, home services platforms, small textile and garment factories, and warehouse operators to record everyday tasks from cooking meals and washing dishes to stitching garments, assembling components and sorting inventory. In some cases, startups have built dedicated ‘data factories’ with motion-tracking rigs.
“Typical clients are robotics, vision-language-action model and world model companies,” said Abhinav Kukreja, founder of Neocambrian AI, which raised funds from angels, including Dalmia Family Office Trust. “There is no equivalent repository of physical behaviour on the internet. Robots need to learn from messy homes, crowded factories, small shops and repair stations, which India offers.
When done right, it can become an additional source of paid work for many workers and households, and we compensate both environment owners and data collectors,” he said.
This data trains world models and physical AI systems, teaching robots to navigate and act in messy, unstructured environments and smart glasses for object recognition. One industry insider said there is significant demand from the defence industry, particularly for autonomous drone applications. The practice also raises questions about privacy, legality and compensation as in some cases videos are recorded without pay and consent from the workers.
TOI learnt that some factories have paused such pilots after the recent backlash.
Manish Agarwal, co-founder of Humyn Labs, which works with leading frontier labs, said demand is growing from robotics OEMs, software makers and enterprises. “We collect and convert this into episodic strings for robot memory, which helps build low to mid-level agentic capabilities including physical action, voice, sight and mobility,” he said. “We are using verified networks of workers across 16 countries as robots cannot be trained only in Indian environments. For European domestic robotics to navigate better, we need training data similar to that environment,” he added.
Startups argue that this is India’s entry into the global AI value chain, and that working with frontier labs could help the country train competitive models of its own in the future. But sceptics see a familiar cost-arbitrage play. Madhukar Yarra, CEO of Bengaluru BPO NextWealth, which annotates these videos, called it a flash in the pan. Much of the data is collected through unorganised gig work, he said.
Sangeeta Gupta, SVP at Nasscom, said physical AI data could diversify India’s AI services beyond traditional data labelling. “But issues around informed consent, anonymisation, worker awareness and ethical use will require continued industry responsibility and evolving safeguards,” she said.