Bulk inserting users with preset passwords to Overleaf CE

Nov 20, 2025 by Boris André | 94 views

https://cylab.be/blog/453/bulk-inserting-users-with-preset-passwords-to-overleaf-ce

We were recently asked to inboard a large amount of users with pre-selected passwords to one of our Overleaf Community Edition instance. The web UI of Overleaf CE only allows adding users one by one (with their email address), and the included NodeJS scripts (accessible from inside the sharelatex container) do not allow admins to specify a given password for a fresh user. We ended up adding users to the MongoDB instance backing up Overleaf CE directly, which is what is explained in this post.

The actors

Our story has only a few characters: our <local> machine, the <remote> host that runs the Docker stack of Overleaf CE, and the MongoDB <container> that serves as Overleaf’s database.

Connection to the DB

The container that runs MongoDB only listens inside of the default docker network created by Overleaf CE toolkit. To connect to it, we will instantiate an SSH tunnel from our local machine to the container, but in order to do that we first need its IP address:

remote> docker inspect mongo | grep IPAdress

The tunnel is then brought up with:

local> ssh -fNL 127.0.0.1:27017:<container_ip>:27017 <remote>

With it, MongoDB will be reachable from our local machine using the following connection string:

mongodb://127.0.0.1:27017/?directConnection=true

The connection string can be tested with, for example, MongoDB compass.

Setting up python

To automate the process of inserting new users in the DB, we decided to use Python.

In a virtual environment, we installed pymongo and bcrypt:

pip install pymongo bcrypt

Getting a template

After a while spent nosing around the database with Compass, we found out that users information is self-contained in the “users” collection (naming things is not intractable after all).

We set up the connection to the database like so:

def get_users_collection_from_mongodb(cnx_string: str) -> Collection[Mapping[str, Any] | Any]:
    try:
        print(f"Connecting to MongoDB: {cnx_string}")
        # The default timeout is 20 seconds, which we found to be a tad much 
        # for a typical LAN connection
        client = pymongo.MongoClient(cnx_string, serverSelectionTimeoutMS=5000)
        # `server_info()` is used to make sure we are actually connected to the DB
        # (every step so far is lazy)
        client.server_info()
        print("Connection successful")
    except pymongo.errors.ServerSelectionTimeoutError as exc:
        print(f"Timeout while connecting to MongoDB:\n\n{exc}")
        exit(-1)
    db = client["sharelatex"]
    return db["users"]

And we extract a template for what a user should look like:

col = get_users_collection_from_mongodb("...")
bob = col.find_one()
with open("user_template.json", mode="w", encoding="utf8") as f:
    json.dump(bob, f, default=str, indent=2, sort_keys=True)

default=str is used to stupidly cast any JSON-adverse object to their text representation (e.g. datetime or ObjectId).

The template is later loaded in the script with:

with open("user_template.json", mode="r") as f:
    USER = json.load(f)

Preparing new user objects

We then wrote a small helper function to construct new user objects from the base template. What we were given as input for the batch of users to insert are a name, an email address and a preset password, so those will be the input of our helper:

def prepare_user(email: str, firstname: str, password: str) -> dict:
    user: dict = dict(USER)
    user["signUpDate"] = datetime.datetime.now()
    user["lastLoggedIn"] = datetime.datetime.now()
    user["lastLoginIp"] = '127.0.0.1'
    user["email"] = email
    user["emails"] = [
        {
            '_id': 1,
            'createdAt': datetime.datetime.now(),
            'email': email,
            'reversedHostname': email.split("@")[1][::-1],
        }
    ]
    user["hashedPassword"] = hash_password(password)
    user["first_name"] = firstname
    return user

The password is hashed using bcrypt:

def hash_password(password: str) -> str:
    salt = bcrypt.gensalt(rounds=40, prefix=b"2a")
    hashed_password = bcrypt.hashpw(password.encode("utf8"), salt)
    return hashed_password.decode("utf8")

Collecting the list of users to add

The list of users to add was given to us as a CSV file that we could parse with:

def read_students(input_file: str) -> list[dict[str, str]]:
    with open(input_file, mode='r', newline='') as f:
        reader = DictReader(f)
        expected_fields = ["firstname","email","password"]
        missing_fields = [field for field in expected_fields if field not in reader.fieldnames]
        if missing_fields:
            raise ValueError(f"malformed CSV file: \"{input_file}\"; expected fields: {', '.join(expected_fields)}")
        return [{k: v for k, v in row.items() if k in expected_fields} for row in reader]

A small detour for convenience

For the sake of self-documentation and user friendliness, should the issue present itself in the future, we spent a few minutes adding an argument parser to our script:

def parse_args():
    parser = argparse.ArgumentParser(description="Bulk insert users into Overleaf's DB")

    # MongoDB connection string parameter
    parser.add_argument(
        "--mongo",
        default=os.getenv("OL_MONGO"),
        help="MongoDB connection string (can also be set with OL_MONGO environment variable)",
        required=not bool(os.getenv("OL_MONGO"))
    )

    # Existing file path parameter
    parser.add_argument(
        "--students",
        type=Path,
        default=os.getenv("OL_STUDENTS"),
        help="Path to an existing file (can also be set with OL_STUDENTS environment variable)",
        required=not bool(os.getenv("OL_STUDENTS"))
    )

    args = parser.parse_args()

    # Validate file exists
    if not args.students.is_file():
        parser.error(f"File path '{args.students}' does not exist or is not a file.")

    return args

It only takes the connection string to MongoDB and a local (existing) file in arguments.

Bringing it all together

Finally, all that is left to do is to put all the pieces in the correct order:

if __name__ == "__main__":
    args = parse_args()
    students = read_students(args.students)
    users_collection = get_users_collection_from_mongodb(args.mongo)
    users_to_add = []
    for student in students:
        # Computing the hash is expensive; a fair bit of time can be saved
        # by checking for existing users first
        if users_collection.count_documents({'email': student['email']}) > 0:
            print(f"WARNING: a user for {student['email']} already exists, not adding it")
        else:
            print(f"INFO: will create user for {student['email']}")
            new_user = prepare_user(**student)
            users_to_add.append(new_user)
    if users_to_add:
        if confirm(f"There are {len(users_to_add)} user(s) on the verge of being inserted into the database; continue?"):
            users_collection.insert_many(users_to_add)

The confirmation prompt is straightforward:

def confirm(prompt="Continue? (y/n): "):
    while True:
        response = input(prompt).strip().lower()
        if response in ('y', 'yes'):
            return True
        elif response in ('n', 'no'):
            return False
        else:
            print("Please enter 'y' or 'n'.")

And the tunnel can now be closed:

ps -aux | grep 'ssh -fNL 127.0.0.1:27017' | grep -v grep | awk '{print $2}' | xargs -I{} kill {}

This blog post is licensed under CC BY-SA 4.0

This website uses cookies. More information about the use of cookies is available in the cookies policy.
Accept