Anki is an incredibly powerful tool for spaced repetition learning. However, manually creating high-quality Anki cards from multiple-choice questions (MCQs) can be time-consuming. In this post, I will share my streamlined approach to generating Anki cards from MCQs using AI, significantly reducing the time and effort required.
We will cover the following steps:
Extract MCQs from a PDF file using AI.
Convert extracted MCQs into Anki cards using AI.
Add the generated Anki cards to an Anki collection for study.
I will be using the latest gemini-2.0-flash-exp model for this tutorial. You can use other models as well.
Due to the model's maximum output token limit of 8192, it's recommended to
process PDF files containing 50-80 MCQs at a time. If you have more MCQs,
consider splitting them into multiple files.
We define extract_mcqs() to extract MCQs while correcting grammatical mistakes. My PDF files have scanned images, therefore I am using AI for better accuracy. You can also use PyMuPDF or PyPDF2 for text-based PDF extraction.
main.py
def extract_mcqs(filepath: pathlib.Path) -> str: # Validate if file exists and is PDF format if not filepath.exists(): raise FileNotFoundError(f"PDF file not found: {filepath}") if filepath.suffix.lower() != '.pdf': raise ValueError(f"File must be a PDF: {filepath}") # Configure generation parameters for MCQ extraction prompt = "Extract MCQs" response = client.models.generate_content( model="gemini-2.0-flash-exp", # Using Gemini 2.0 flash model for fast processing config=types.GenerateContentConfig( # Set system instruction for the AI model system_instruction="Extract all the MCQs from the following pdf file. Make sure the text is grammatically correct and structured according to MCQ", temperature=1, # Maximum creativity in responses top_p=0.95, # High diversity in token selection top_k=40, # Consider top 40 tokens for each step max_output_tokens=8192, # Maximum length of generated response response_mime_type="text/plain", ), contents=[ # Convert PDF to bytes for API consumption types.Part.from_bytes( data=filepath.read_bytes(), mime_type='application/pdf', ), prompt ] ) return response.text
Next, we will define a function convert_to_anki_cards to convert the extracted MCQs into Anki cards.
The function will take the MCQs string (generated by the previous function extract_mcqs)
as input and return a list of AnkiCard objects. Read more about structured output
main.py
def convert_to_anki_cards(mcq_text: str) -> list[AnkiCard]: # Convert extracted MCQs to Anki card format using AI response = client.models.generate_content( model="gemini-2.0-flash-exp", config=types.GenerateContentConfig( # Detailed system prompt explaining conversion rules system_instruction=""" I am converting multiple-choice questions (MCQs) into Anki cloze deletion cards for MBBS students. Instructions: - The output should be in JSON array format, where each MCQ is converted into a JSON object. - Each JSON object must contain: 1. "Text" – A well-structured cloze deletion statement ensuring the key concept from the MCQ is retained. 2. "Extra" – A concise (1-2 sentences) explanation providing relevant contextual or anatomical details. - You can also restructure the MCQ itself if needed to improve readability and make the cloze deletion card more effective and to ensure the card communicates the concept effectively. - Convert questions into assertive statements where feasible to enhance clarity and learning. - For MCQs that ask “Which of the following is true/false,” convert them into assertive statements and use multiple cloze deletions if necessary to retain all relevant information. - When multiple correct answers exist, use separate cloze deletions for each. - If there are more than one key information points, use more than one cloze (but a maximum of three clozes.) - Avoid negative (with never/no) cloze statements. <examples> MCQs: A young boy suffering from inflammation of parotid gland complained of severe pain in the region of the gland, in the auricle and external acoustic meatus. The accompanied pain in the ear is due to common nerve supply by: (A) Auriculotemporal & greater auricular (B) Auriculotemporal & chorda tympani (C) Auriculotemporal & superior alveolar (D) Posterior auricular & greater auricular Which nerve does not supply the presulcal part of the tongue? A. Facial nerve B. Trigeminal nerve C. Hypoglossal nerve D. Vagus nerve Generated JSON: [ { "Extra": "The auriculotemporal nerve and greater auricular nerve share sensory innervation of the parotid gland, auricle, and external acoustic meatus. Inflammation can cause referred pain.", "Text": "Inflammation of the {{c1::parotid gland}} can cause {{c2::ear pain}} due to common nerve supply by the {{c3::auriculotemporal}} and {{c3::greater auricular}} nerves." }, { "Text": "The presulcal part of the tongue is supplied by {{c1::trigeminal}}, {{c2::facial}}, and {{c3::hypoglossal}} nerves.", "Extra": "The anterior two-thirds of the tongue receives general sensation from the mandibular division of the trigeminal nerve (V3) and taste sensation from the facial nerve (via the chorda tympani). The hypoglossal nerve controls tongue movements." } ] </examples> Ensure that the generated cloze deletion cards clearly communicate the concept from the MCQ while maintaining accuracy and readability. """, temperature=0.7, # Balanced creativity vs consistency response_mime_type="application/json", response_schema=list[AnkiCard] # Enforce response structure ), contents=[mcq_text] ) # Parse JSON response into AnkiCard objects return [AnkiCard(**card) for card in json.loads(response.text)]
Anki cloze deletions are fill-in-the-blank style cards where parts of text are hidden for testing. In our code, they are marked with {{c1::text}}, where:
c1, c2, c3 etc. indicate different cloze groups
The text between :: is what gets hidden
Multiple clozes with the same number will be hidden simultaneously
The conversion prompt used in this tutorial is specifically engineered for
basic medical sciences MCQs. If you're working with questions from other
subjects, you'll need to modify the prompt to better suit your domain and
desired card structure.
I am converting multiple-choice questions (MCQs) into Anki cloze deletion cards for MBBS students.
Then, we provide detailed instructions:
- The output should be in JSON array format, where each MCQ is converted into a JSON object.- Each JSON object must contain: 1. "Text" – A well-structured cloze deletion statement ensuring the key concept from the MCQ is retained. 2. "Extra" – A concise (1-2 sentences) explanation providing relevant contextual or anatomical details.- You can also restructure the MCQ itself if needed to improve readability and make the cloze deletion card more effective and to ensure the card communicates the concept effectively.- Convert questions into assertive statements where feasible to enhance clarity and learning.- For MCQs that ask “Which of the following is true/false,” convert them into assertive statements and use multiple cloze deletions if necessary to retain all relevant information.- When multiple correct answers exist, use separate cloze deletions for each.- If there are more than one key information points, use more than one cloze (but a maximum of three clozes.)- Avoid negative (with never/no) cloze statements.
Finally, we provide examples of the input and expected output:
<examples>MCQs:A young boy suffering from inflammation of parotid gland complained of severe pain in the region of the gland, in the auricle and external acoustic meatus. The accompanied pain in the ear is due to common nerve supply by:(A) Auriculotemporal & greater auricular(B) Auriculotemporal & chorda tympani(C) Auriculotemporal & superior alveolar(D) Posterior auricular & greater auricularWhich nerve does not supply the presulcal part of the tongue?A. Facial nerveB. Trigeminal nerveC. Hypoglossal nerveD. Vagus nerveGenerated JSON:[ { "Extra": "The auriculotemporal nerve and greater auricular nerve share sensory innervation of the parotid gland, auricle, and external acoustic meatus. Inflammation can cause referred pain.", "Text": "Inflammation of the {{c1::parotid gland}} can cause {{c2::ear pain}} due to common nerve supply by the {{c3::auriculotemporal}} and {{c3::greater auricular}} nerves." }, { "Text": "The presulcal part of the tongue is supplied by {{c1::trigeminal}}, {{c2::facial}}, and {{c3::hypoglossal}} nerves.", "Extra": "The anterior two-thirds of the tongue receives general sensation from the mandibular division of the trigeminal nerve (V3) and taste sensation from the facial nerve (via the chorda tympani). The hypoglossal nerve controls tongue movements." }]</examples>
Now, we will define a function add_cards_to_anki to add the generated Anki cards to an Anki collection.
main.py
def add_cards_to_anki(notes, deck_name="Default", subdeck_name=None, model_name="Basic", anki_path=None, tags=None): """ Add cards to an Anki deck or subdeck Args: notes (list): List of dictionaries with "Text", "Extra" keys deck_name (str): Name of the target deck subdeck_name (str): Name of the subdeck (optional) model_name (str): Name of the note type/model to use anki_path (str): Path to Anki collection (optional) tags (list): List of tags to add to notes """ # Check if we're running inside Anki RUNNING_IN_ANKI = False try: import aqt from aqt import mw from aqt.operations.note import add_note from aqt.utils import showInfo, tooltip # Only set to True if mw is properly initialized if mw and hasattr(mw, 'col') and mw.col is not None: RUNNING_IN_ANKI = True except ImportError: # aqt not available, definitely not running in Anki pass except Exception: # Something else went wrong with Anki imports pass # Use default Anki collection path if not provided if anki_path is None and not RUNNING_IN_ANKI: if os.name == 'nt': # Windows anki_path = os.path.expanduser("~/AppData/Roaming/Anki2/User 1/collection.anki2") elif os.name == 'posix': # macOS/Linux if os.path.exists(os.path.expanduser("~/Library/Application Support/")): # macOS anki_path = os.path.expanduser("~/Library/Application Support/Anki2/User 1/collection.anki2") else: # Linux anki_path = os.path.expanduser("~/.local/share/Anki2/User 1/collection.anki2") try: # Handle differently based on whether we're in Anki or not if RUNNING_IN_ANKI: # Use the Anki main window's collection col = mw.col else: # Running standalone - open the collection directly try: col = Collection(anki_path) except Exception as e: logger.error(f"Could not open Anki collection. Is Anki running? Error: {str(e)}") return False # Retrieve the specified note type (model) model = col.models.by_name(model_name) if not model: error_msg = f"Model '{model_name}' not found" logger.error(error_msg) return False # Construct full deck name including subdeck if provided full_deck_name = deck_name if subdeck_name: full_deck_name = f"{deck_name}::{subdeck_name}" # Get or create the deck deck_id = col.decks.id(full_deck_name) # Associate model with deck col.models.set_current(model) # Process and add each note to the deck added_count = 0 # Add notes in batches to avoid blocking the main thread for too long batch_size = 5 for i in range(0, len(notes), batch_size): batch_notes = notes[i:i+batch_size] for note_data in batch_notes: try: # Create new note with selected model note = Note(col, model) # Set front and back of card note.fields[0] = note_data["Text"] note.fields[1] = note_data["Extra"] # Add tags if provided if tags: note.tags.extend(tags) # Set the deck for this note note.note_type()["did"] = deck_id if RUNNING_IN_ANKI: # Add note using the Anki operation when inside Anki add_note( parent=mw, note=note, target_deck_id=deck_id ).run_in_background() else: # Direct addition when outside Anki col.add_note(note, deck_id) added_count += 1 except Exception as e: logger.error(f"Error adding note: {str(e)}") # Log progress for large batches if len(notes) > batch_size and i + batch_size < len(notes): logger.info(f"Added batch {i//batch_size + 1}/{(len(notes) + batch_size - 1)//batch_size}...") logger.info(f"Successfully added {added_count} cards to deck '{full_deck_name}'") return True except Exception as e: logger.error(f"Error working with Anki collection: {str(e)}") return False finally: # Ensure collection is properly closed if we opened it if not RUNNING_IN_ANKI and 'col' in locals() and col: try: col.close(save=True) except TypeError: # Older versions might not accept the save parameter col.close()
Save your pdf file having MCQs in the pdfs directory.
Set your environment variable in .env file.
.env
GOOGLE_AI_STUDIO_API_KEY=your_api_key_here
Update the main() function with your PDF path and desired deck name. If your PDF files are in the same directory as the script, as shown below:
main.py
pdf_path = pathlib.Path("pdfs/your_mcqs.pdf")
If your PDF files are in a different directory, you can specify the full path to the file:
main.py
pdf_path = pathlib.Path("path/to/your_mcqs.pdf")
Run the script:
Terminal
python main.py
After running the script, we will have a new output folder with .txt file containing extracted MCQs, a .json file containing generated Anki cards, and the cards will be added to the specified Anki deck.
your-mcqs.txt
your-mcqs.json
your-mcqs.pdf
.env
main.py
requirements.txt
Troubleshooting
ImportError: No module found
Make sure you've activated the virtual environment
Verify all dependencies are installed: pip list
API Key errors
Ensure the GOOGLE_AI_STUDIO_API_KEY is set correctly in .env
Ensure the .env file is in the same directory as main.py
Invalid PDF errors
Ensure the PDF file exists at the specified path
Verify the PDF file is not corrupted and follows the expected format
Anki Collection errors
Ensure Anki is closed when running the script
Verify the collection path exists
Check if you have proper permissions to access the collection file
import loggingimport pathlibimport jsonimport osfrom pydantic import BaseModelfrom google import genaifrom google.genai import typesfrom anki.collection import Collectionfrom anki.notes import Notefrom aqt.operations.note import add_notefrom dotenv import load_dotenv# Load environment variables from .env fileload_dotenv()# Configure basic logging with INFO level for debugging and trackinglogging.basicConfig(level=logging.INFO)logger = logging.getLogger(__name__)class AnkiCard(BaseModel): Text: str # Front side of the card containing cloze deletions Extra: str # Back side of the card with explanations# Initialize the Gemini clientclient = genai.Client(api_key=os.getenv("GOOGLE_AI_STUDIO_API_KEY"))def extract_mcqs(filepath: pathlib.Path) -> str: # Validate if file exists and is PDF format if not filepath.exists(): raise FileNotFoundError(f"PDF file not found: {filepath}") if filepath.suffix.lower() != '.pdf': raise ValueError(f"File must be a PDF: {filepath}") # Configure generation parameters for MCQ extraction prompt = "Extract MCQs" response = client.models.generate_content( model="gemini-2.0-flash-exp", # Using Gemini 2.0 flash model for fast processing config=types.GenerateContentConfig( # Set system instruction for the AI model system_instruction="Extract all the MCQs from the following pdf file. Make sure the text is grammatically correct and structured according to MCQ", temperature=1, # Maximum creativity in responses top_p=0.95, # High diversity in token selection top_k=40, # Consider top 40 tokens for each step max_output_tokens=8192, # Maximum length of generated response response_mime_type="text/plain", ), contents=[ # Convert PDF to bytes for API consumption types.Part.from_bytes( data=filepath.read_bytes(), mime_type='application/pdf', ), prompt ] ) return response.textdef convert_to_anki_cards(mcq_text: str) -> list[AnkiCard]: # Convert extracted MCQs to Anki card format using AI response = client.models.generate_content( model="gemini-2.0-flash-exp", config=types.GenerateContentConfig( # Detailed system prompt explaining conversion rules system_instruction=""" I am converting multiple-choice questions (MCQs) into Anki cloze deletion cards for MBBS students. Instructions: - The output should be in JSON array format, where each MCQ is converted into a JSON object. - Each JSON object must contain: 1. "Text" – A well-structured cloze deletion statement ensuring the key concept from the MCQ is retained. 2. "Extra" – A concise (1-2 sentences) explanation providing relevant contextual or anatomical details. - You can also restructure the MCQ itself if needed to improve readability and make the cloze deletion card more effective and to ensure the card communicates the concept effectively. - Convert questions into assertive statements where feasible to enhance clarity and learning. - For MCQs that ask “Which of the following is true/false,” convert them into assertive statements and use multiple cloze deletions if necessary to retain all relevant information. - When multiple correct answers exist, use separate cloze deletions for each. - If there are more than one key information points, use more than one cloze (but a maximum of three clozes.) - Avoid negative (with never/no) cloze statements. <examples> MCQs: A young boy suffering from inflammation of parotid gland complained of severe pain in the region of the gland, in the auricle and external acoustic meatus. The accompanied pain in the ear is due to common nerve supply by: (A) Auriculotemporal & greater auricular (B) Auriculotemporal & chorda tympani (C) Auriculotemporal & superior alveolar (D) Posterior auricular & greater auricular Which nerve does not supply the presulcal part of the tongue? A. Facial nerve B. Trigeminal nerve C. Hypoglossal nerve D. Vagus nerve Generated JSON: [ { "Extra": "The auriculotemporal nerve and greater auricular nerve share sensory innervation of the parotid gland, auricle, and external acoustic meatus. Inflammation can cause referred pain.", "Text": "Inflammation of the {{c1::parotid gland}} can cause {{c2::ear pain}} due to common nerve supply by the {{c3::auriculotemporal}} and {{c3::greater auricular}} nerves." }, { "Text": "The presulcal part of the tongue is supplied by {{c1::trigeminal}}, {{c2::facial}}, and {{c3::hypoglossal}} nerves.", "Extra": "The anterior two-thirds of the tongue receives general sensation from the mandibular division of the trigeminal nerve (V3) and taste sensation from the facial nerve (via the chorda tympani). The hypoglossal nerve controls tongue movements." } ] </examples> Ensure that the generated cloze deletion cards clearly communicate the concept from the MCQ while maintaining accuracy and readability. """, temperature=0.7, # Balanced creativity vs consistency response_mime_type="application/json", response_schema=list[AnkiCard] # Enforce response structure ), contents=[mcq_text] ) # Parse JSON response into AnkiCard objects return [AnkiCard(**card) for card in json.loads(response.text)]def add_cards_to_anki(notes, deck_name="Default", subdeck_name=None, model_name="Basic", anki_path=None, tags=None): """ Add cards to an Anki deck or subdeck Args: notes (list): List of dictionaries with "Text", "Extra" keys deck_name (str): Name of the target deck subdeck_name (str): Name of the subdeck (optional) model_name (str): Name of the note type/model to use anki_path (str): Path to Anki collection (optional) tags (list): List of tags to add to notes """ # Check if we're running inside Anki RUNNING_IN_ANKI = False try: import aqt from aqt import mw from aqt.operations.note import add_note from aqt.utils import showInfo, tooltip # Only set to True if mw is properly initialized if mw and hasattr(mw, 'col') and mw.col is not None: RUNNING_IN_ANKI = True except ImportError: # aqt not available, definitely not running in Anki pass except Exception: # Something else went wrong with Anki imports pass # Use default Anki collection path if not provided if anki_path is None and not RUNNING_IN_ANKI: if os.name == 'nt': # Windows anki_path = os.path.expanduser("~/AppData/Roaming/Anki2/User 1/collection.anki2") elif os.name == 'posix': # macOS/Linux if os.path.exists(os.path.expanduser("~/Library/Application Support/")): # macOS anki_path = os.path.expanduser("~/Library/Application Support/Anki2/User 1/collection.anki2") else: # Linux anki_path = os.path.expanduser("~/.local/share/Anki2/User 1/collection.anki2") try: # Handle differently based on whether we're in Anki or not if RUNNING_IN_ANKI: # Use the Anki main window's collection col = mw.col else: # Running standalone - open the collection directly try: col = Collection(anki_path) except Exception as e: logger.error(f"Could not open Anki collection. Is Anki running? Error: {str(e)}") return False # Retrieve the specified note type (model) model = col.models.by_name(model_name) if not model: error_msg = f"Model '{model_name}' not found" logger.error(error_msg) return False # Construct full deck name including subdeck if provided full_deck_name = deck_name if subdeck_name: full_deck_name = f"{deck_name}::{subdeck_name}" # Get or create the deck deck_id = col.decks.id(full_deck_name) # Associate model with deck col.models.set_current(model) # Process and add each note to the deck added_count = 0 # Add notes in batches to avoid blocking the main thread for too long batch_size = 5 for i in range(0, len(notes), batch_size): batch_notes = notes[i:i+batch_size] for note_data in batch_notes: try: # Create new note with selected model note = Note(col, model) # Set front and back of card note.fields[0] = note_data["Text"] note.fields[1] = note_data["Extra"] # Add tags if provided if tags: note.tags.extend(tags) # Set the deck for this note note.note_type()["did"] = deck_id if RUNNING_IN_ANKI: # Add note using the Anki operation when inside Anki add_note( parent=mw, note=note, target_deck_id=deck_id ).run_in_background() else: # Direct addition when outside Anki col.add_note(note, deck_id) added_count += 1 except Exception as e: logger.error(f"Error adding note: {str(e)}") # Log progress for large batches if len(notes) > batch_size and i + batch_size < len(notes): logger.info(f"Added batch {i//batch_size + 1}/{(len(notes) + batch_size - 1)//batch_size}...") logger.info(f"Successfully added {added_count} cards to deck '{full_deck_name}'") return True except Exception as e: logger.error(f"Error working with Anki collection: {str(e)}") return False finally: # Ensure collection is properly closed if we opened it if not RUNNING_IN_ANKI and 'col' in locals() and col: try: col.close(save=True) except TypeError: # Older versions might not accept the save parameter col.close()def main(): # Configuration pdf_path = pathlib.Path("pdfs/your_mcqs.pdf") deck_name = "UHS_MS2" subdeck_name = "GIT::Anatomy" try: # Create output directory if it doesn't exist output_dir = pathlib.Path("output") output_dir.mkdir(exist_ok=True) # Step 1: Extract MCQs mcq_text = extract_mcqs(pdf_path) # Save extracted text to output directory txt_path = output_dir / f"{pdf_path.stem}.txt" txt_path.write_text(mcq_text, encoding='utf-8') logger.info(f"MCQs saved to {txt_path}") # Step 2: Convert to Anki cards anki_cards = convert_to_anki_cards(mcq_text) # Step 3: Save as JSON to output directory json_path = output_dir / f"{pdf_path.stem}.json" json_path.write_text( json.dumps([vars(card) for card in anki_cards], indent=2, ensure_ascii=False), encoding='utf-8' ) # Step 4: Add to Anki add_cards_to_anki( notes=anki_cards, deck_name=deck_name, subdeck_name=subdeck_name, model_name="AnKingOverhaul (AnKing / AnKingMed)", tags=["Past_Papers", f"#{deck_name}::{subdeck_name}"] ) except Exception as e: logger.error(f"Error processing MCQs: {e}")if __name__ == "__main__": main()