You don't need a computer science degree to understand how deepfakes work. The core idea is straightforward, and knowing the basics helps you recognise when something has been faked -- and why it's getting harder to tell.
The Engine: Generative Adversarial Networks (GANs)
Most deepfakes are built using a system called a GAN. Think of it as two AI programs locked in a competition.
How GANs Work (Plain English)
The Generator creates fake content. It starts out terrible -- blurry faces, wrong proportions, obvious artefacts. The Discriminator tries to tell the difference between real content and the Generator's fakes. Every time the Discriminator catches a fake, the Generator learns from the mistake and tries again. Every time the Generator fools the Discriminator, the Discriminator gets better at spotting fakes. They push each other forward. After millions of rounds, the Generator produces output that the Discriminator can't reliably distinguish from real content.
This adversarial loop is why deepfakes improve so rapidly. The system trains itself. No human needs to manually fix each error -- the AI identifies its own weaknesses and corrects them automatically.
Three Types of Deepfake
Not all deepfakes work the same way. The three main categories each use different techniques and require different amounts of source material.
1. Face Swapping
This is the most common type. The AI takes one person's face and places it on another person's body in a video. The body movements, head angle, and lighting all come from the original footage -- only the face is replaced. A basic face swap can be done with as few as 10 to 20 clear photos of the target. The result will have visible artefacts, but at thumbnail size on a phone screen, it can be convincing enough to go viral.
2. Face Reenactment
More advanced than a swap. Face reenactment takes a real video of Person A and transfers their facial expressions, lip movements, and head turns onto Person B in real time. The output is a video of Person B saying and doing things they never actually did. This requires significantly more data -- typically several hours of video footage of Person B from multiple angles. The result is far more convincing than a basic swap because the expressions are driven by real human motion.
3. Voice Cloning
A separate but related technology. Voice cloning AI analyses recordings of someone's speech and learns their vocal patterns -- pitch, cadence, accent, breathing habits, the way they emphasise certain words. Modern voice cloning tools can produce a passable clone from as little as 3 to 5 minutes of clear audio. Given an hour of recordings, the clone becomes extremely convincing, able to speak new sentences in the target's voice with natural intonation.
The most dangerous deepfakes combine all three: a face-swapped video with reenacted expressions and cloned audio. These are rare because they require significant source data and compute power, but they're the hardest to detect.
How Much Data Does It Take?
- Basic face swap: 10 to 20 photos of the target's face. Takes minutes to produce. Quality is low but good enough for social media thumbnails.
- Convincing face swap: 200 to 500 photos or several minutes of video. Takes hours to train. Holds up at full-screen resolution.
- Real-time face reenactment: Several hours of video from multiple angles. Takes days to train. Can fool most viewers.
- Voice clone (basic): 3 to 5 minutes of clear speech audio. Usable for short clips.
- Voice clone (convincing): 30 to 60 minutes of varied speech. Can sustain a full phone conversation.
Public figures are the easiest targets because thousands of hours of their video and audio already exist online. Anyone who posts regularly on YouTube, TikTok, or Instagram provides more than enough training data for a high-quality deepfake.
Why It's Getting Easier
Three years ago, creating a convincing deepfake required expensive hardware, technical knowledge, and days of processing time. That has changed.
Open-source tools. Free, open-source deepfake software is available on GitHub. The interfaces have gone from command-line-only to drag-and-drop desktop apps. Anyone with a mid-range gaming PC can run them.
Cheaper compute. Cloud GPU rental costs have dropped dramatically. You can rent the processing power needed for a high-quality deepfake for under $20 through services that don't ask what you're using it for.
Better models. The underlying AI models improve every few months. Each generation reduces the training data needed, shortens processing time, and produces higher-quality output. Artefacts that were reliable detection signals last year have been fixed in current models.
Mobile apps. Face-swapping apps on phones can produce basic deepfakes in seconds. The quality is low, but low quality is enough to fool people scrolling quickly through a social media feed.
Real-World Damage
This isn't theoretical. Deepfakes are causing real harm right now.
Political misinformation: Deepfake videos of political figures making fabricated statements have gone viral in multiple elections globally. By the time fact-checkers respond, millions have already seen and shared the original. Financial fraud: In 2024, a finance worker in Hong Kong transferred US$25 million after a deepfake video call that appeared to show their company's CFO and other executives. Revenge content: Non-consensual intimate deepfakes are the fastest-growing category. They're used for blackmail, harassment, and reputational destruction. Celebrity scams: Fake endorsement videos featuring AI-generated versions of well-known Australians are used to promote cryptocurrency scams and investment fraud.
The Arms Race: Creation vs Detection
Detection technology exists, but it's always playing catch-up. Every time researchers publish a new detection method, deepfake creators study it and adjust their models to avoid triggering it.
Current detection approaches:
- Biological signal analysis -- looking for natural blood flow patterns under the skin (real faces show subtle colour changes with each heartbeat; deepfakes don't)
- Frequency analysis -- examining the image at the pixel level for patterns that neural networks leave behind but human eyes can't see
- Temporal consistency -- checking whether details like moles, wrinkles, and skin texture stay consistent across frames
- Provenance tracking -- cryptographic signatures embedded in cameras and software that verify content hasn't been altered (C2PA standard)
The provenance approach is the most promising long-term solution. Rather than trying to prove something is fake after the fact, it proves the original is real at the point of capture. Major camera manufacturers and tech companies are adopting the C2PA standard, but it will take years before it's widespread enough to be reliable.
You don't need to become a detection expert. The most effective defence is a sceptical mindset: question unexpected or emotionally charged video content, check the source, reverse-image search key frames, and never share something you can't verify. Read our guide to spotting deepfakes for specific visual and audio red flags to check.
The technology will keep getting better. Detection will keep catching up. But in the gap between a deepfake going viral and being debunked, real damage happens -- to reputations, to elections, to bank accounts, to people's lives. Understanding how the technology works is the first step to not being fooled by it.