CMSC 435 PROBLEM STATEMENTS

Last edited 2026-02-12. Projects under considreation for Spring 2026.

Timelines for the semester will be as called out in a separate document. How a team establishes its intermediate goals in order to meet requirements and hit hard deadlines is up to the team. I suggest that you conduct a risk assessment right away, but how long you put off discovering thorny issues is also up to the team. You can even treat the project as a big hackathon at the end if you like; it's your career, after all, and it isn't like campus doesn't promote such things. But, I bet you'll also find that hackathons are better for campus than for you, and in any event this approach has yet to end well in a 435 project. Your call.

All students on a project are equal stakeholders in the effort. No one person must work at the direction of another; we cooperate in order to win. The incentive to hold others accountable is clear: all of these projects are scoped with the expectation we'll have full effort from everyone, and there is no partial credit for partial success. We tolerate others' lack of engagement at a cost paid in our own time and grades.

The incentive to fully participate should be clear. First, the final exam is constructed to reward those who have done the work all along. It chiefly address deep technical issues involving this project. Those who didn't do the work won't know how to answer questions and pass the class. Similarly if we have done the work, but not established a paper trail to this effect along the way, then we won't be able to support answers to our questions which require reflection. There is only one snapshot in time from which to work - the end of project. Without the paper trail to analyze what the perspectives were historically, then there is not much credibility in the answers. Said another way, nobody has yet been able to recreate a project record that is credible enough for exam answers, and of course there is no ability later to go back in time to insert material into the record. (Translation: Waiting until near the final exam in order to manufacture history for convenience of answering questions is deemed "not credible".)

Second, the cover sheet submitted (as our academic integrity and intellectual property statement) lists who gets credit; no name, no credit. The decision of who signs the sheet is ultimately a team consensus. Basically the rest of the team can vote someone off the island, though this is not the common occurrence, and we'd like to have exhausted our inventory of practices to promote positive engagement before it reaches that point.

My advice: do the work and document it to pass the class. It might just be that these practices actually work too. Bonus!

I offer these projects as an opportunity for us to practice substantive application of software engineering principles. We will learn by trying them out, making design decisions and then studying the nuanced consequences. We can't close that loop if we don't have a detailed record of the decisions we made, however, and that is the most important reason for our serious obligations to log activity and articulate our reasoning as we go. Working code alone won't tell us we reached our learning objectives. Please take this process seriously from the start and we will win best value from 435.

-- Jim Purtilo

 


ANT POSING

Long form: Automated Identification and Annotation of Movable Joints in 3D CAD Models using Vision-Language Models (VLMs)

Current era digital technologies have been a boon to scientists who are now able to create sophisticated models of their subjects from nature for detailed study. This is especially true for zoologists who can learn much from these studies, and of importance, also apply much. Whole new avenues of investigation into, say, locomotion and robotics can be opened by study of ants, for example. Engineers are eager to learn more about how nature managed to do it. Moreover, such digital capture can unlock the shape and form thousands of species to educators and artists: from kids interacting with species in VR, to sculpturists creating art, to 3D designers in gaming and hollywood.

The problem however is that creating these detailed models is labor intensive, especially for high acuity systems where (as an example of interest to us) we must capture the joints in insect limbs accurately to be of any use. The axis of movement, range of motion and more are critical.

The overall research opportunity is to develop a software pipeline that leverages Vision-Language Models (VLMs) to analyze 3D CAD models of these subjects (ants), identify movable joints, and generate an annotated 3D model with labeled articulations. We'd like if system will facilitate automated recognition of mechanical components' degrees of freedom, enhancing efficiency in robotics, simulation and digital twin applications. We'd like if that is true and we can reduce the cost of creating such models ... but we don't know. This project is intended to let us find out.

Last year we made a spectacular start on this problem. The prior team came up with a way to process a static 3D scan of some specimen and construct a dynamic model. This still needs a lot of shaking out but so far it looks like it works. Cool!

The next technical step is thus to figure out how the captured specimen should be posed. After all, it may have movable joints but that doesn't mean we can move them any old way for purposes of scholarly study. This is where the AI comes in: we want to pose them in a realistic way as predicted by study of prior, validated models.

The overall goal remains: we want to engineer an effective tool that enables scientists to conduct research and open a world of biological form to artists. This semester we should be able to reach this goal by producing a software tool capable of analyzing and annotating mechanical joints in CAD models, and posing them in 'realistic' ways that can be tuned. This means also that we must be on top of the work flow of users - enabling them to check and correct models smoothly. Our initial domain will be of modeling ants, though the design should not limit us in application to other domains.

We will recognize success when we are able to observe scholars prepare (and improve) accurate and dynamic models by mostly automatic means, at substantially less cost than by manual techniques; and export models compatible with robotic simulation and kinematic analysis. So yes, in the end, our AI system should allow scientists to start with images of ants and create animations that teach them how to walk correctly.

As with any project supporting research needs, many ambiguities are involved and the team should be prepared to handle change smoothly. There is no one-and-done solution to be had in this project. This project offers spectacular publication potential, however, and there is plenty of follow on research opportunity if successful. Our client is Prof. Evan Economo in Department of Entomology.


VTEAM

The Neuromotor Control and Learning (NMCL) lab is home to a research group at UMD that examines the cognitive-motor mechanisms that arise when individuals collaborate with robotic systems having various planning and control capabilities. An emerging research interest of the lab is the study of neurocognitive processes of individuals when engaged in shared autonomy and joint action assistive robotics contexts. In one new such envisioned research project, a user would share, with an artificial controller, blended control of a robotic arm to perform a 3-D reach-and-grasp task.

This capstone project will support this research effort by developing an interactive software platform for virtual shared-autonomy robot manipulation. The platform will simulate, in a 3D virtual environment, a single anthropomorphic robotic arm with a gripper capable of reaching toward and grasping objects (e.g., cylinders, spheres). A user will control the robot via a low-dimensional input device (e.g., joystick, dataglove), while an autonomous controller simultaneously generates assistive motion commands. These command sources will be combined through a parameterized blending mechanism that spans a range from full human control to full robotic autonomy, including intermediate combinations.

This interactive software platform will enable a variety of experimental manipulations and will support future investigations across a broad set of questions in human-robot interaction. To be clear: this system will support research, meaning that at this point much is yet to be figured out about best ways to proceed. There is no one-and-done program to grind out and toss over the wall. A big part of the effort will involve crafting clear ways to recognize system success, that is to say, the active discovery processes to be discussed in class are key to our success. That's another way of saying that this project is not performative where we win just by building something; we win by building something that yields new research insights. We engage accordingly.

Our client in this project is the NMCL lab, and the primary point of contact is Geoffrey Short, a Ph.D. student in the lab, working under the supervision of Dr. Rodolphe Gentili.


MAPLEGROVE MANAGED APPLICATIONS

Maplegrove Hosting is seeking to expand its product offerings by moving beyond web, email, and dedicated instance offerings and into managed hosted applications. Customers increasingly want a minimal-touch experience and cost structure to launch specific applications or containers vs performing web system based installs (e.g. LAMP stack applications). In this project a team will explore a platform integration for selling and orchestrating the provisioning of managed applications.

Candidates for experimentation NextCloud/OwnCloud, BitWarden, and NetBox as sellables. What do all of these have in common? They all have docker images that can provide customers portability, isolation, and scale-up. To offer managed applications, the project team will develop and integrate automation into FOSSBilling, an extensible platform that manages customers, invoicing, and a basic control panel.

A successful customer experience will enable customers to order a managed applications through FOSSBilling and have it provision on associated compute. Customers will be able to manage their product through the control panel, create configuration backups, and perform administrative tasks such as starting/stopping their associated application container and perform updates. Solution approach will require automation in other areas, to include publishing DNS records, proxying requests through a webserver front-end configuration, and handling all integration with associated workflows for container management. The project may also orthogonally explore VM provisioning orchestration as an alternative to meeting stated objectives. A successful client outcome will result in fielding an enterprise-ready solution into the Maplegrove Hosting environment as a pilot for new customer outcomes.

Our client is MGP for which the point of contact is Christian Johnson.


LEXICOGRAPHY

As part of its regular operations, Ectralex monitors all major PRC press conferences and other events featuring questions and answers or official statements by ministry spokespersons (as well as some additional PAI). Notable among these events are the daily Ministry of Foreign Affairs PC, the weekly Ministry of Commerce PC, and the bi-weekly Ministry of National Defense PC.

The three press events specifically mentioned above are of particular interest to Echtralex because, in addition to a Chinese transcript, official English translations are provided - sometimes very quickly after the event itself (in the case of MoFA), and other times within a few days or weeks of the event (in the case of MOFCOM and MoND).

Echtralex provides end users with parallel versions of these events featuring the Chinese and English set against each other, aligned at the sentence level. Staff members construct these documents from both sides of each event, making everything available on the archive portion of its website.

While the process of preparing these documents is relatively straightforward, the task itself is somewhat onerous. There is a need to wait for both sides of the event to be posted; the text must then be manually aligned to the sentence level (though the transcript and translation generally appear aligned at the sentence level when they are posted.) This does not necessarily make it simpler or less time-consuming.

The opportunity is provide Echtralex with a software solution that will smooth the work flow and make the process more efficient. We would like to automate this to the maximum extent possible. However, because of a number of well-understood factors related to the translation of Chinese into English (and perhaps because of other issues unknown), we anticipate challenges.

The core functionality will be reliable production of high quality documents with the text alignments. However, the task of identifying relevant documents and moving them to and through the tools could itself be a costly manual activity, so we anticipate that the team will need to model this work flow carefully and ensure their solution automates the capture of text in the first place and integrates it to the rest of the Echtralex processing tools which handle the publication of documents. To be clear, this project is not about building a web site for public access; it is about streamlining the preparation of documents for Echtralex staff to then push to their web site.

Our client in project is Echtralex, for which the point of contact is Michael Horlick.


A NEURAL MODEL OF WORKING MEMORY IN PILOTING

The sophisticated capabilities of the human brain (e.g., planning, decision-making) stem from the emergent properties which arise from it, notably, high-level cognitive abilities. These are collectively known as the brain’s executive function, a core component of which is the mechanism to keep and manipulate recently acquired information, known as working memory (WM). The human WM capacity is relatively small (prior work has suggested a range of 3-9 items) but is enough to accomplish a wide range of complex tasks. For example, for a pilot to successfully operate an aircraft, they must be able to simultaneously accomplish several sub goals. Examples include communicating with ground stations, controlling the flight path, managing fuel resources, and monitoring system behavior. WM allows the pilot to manipulate different streams of incoming information to accomplish these tasks.

Coincidentally, the cognitive-motor task described above was modeled by the National Aeronautics and Space Administration in a software called the Multi-Attribute Task Battery (MATB). Human participants have completed the MATB in many studies, being used to study WM, mental workload, and other cognitive-motor processes. However, to our knowledge, there is no biologically plausible neural network model which can solve the MATB. Such a model would be incredibly helpful as it would allow us to study simulated neural dynamics underlying the completion of the task, facilitating comparison with data from humans who completed the task.

A successful project will allow users to have full control over the parameters of the task (e.g., the difficulty of the sub-tasks), neural model (e.g., the spatial orientation of the neural units), learning procedure (e.g., synaptic plasticity), and other important components, all using a graphical user interface. Then, it should be able to run simulations of the model solving the MATB. During a simulation, the dynamics of the model (e.g., activity of sensors placed onto the neural network) are to be visualized in real-time in such a manner that is comparable with human performance of the task. Given the difficulty of the task to be solved, a high-quality project will leverage cutting-edge technology (e.g., GPUs, automatic differentiation, just-in-time compilation) to create an efficient simulation platform.

Our client in this project is Arya Teymourlouei.


BIODIVERSITY

Biodiversity Data ETL and Machine Learning Architecture for Route Optimization

The University of Maryland maintains diverse ecological datasets related to campus vegetation and biodiversity, along with growing streams of external observations such as iNaturalist records. While previous mapping and routing initiatives have focused on visualization and interactive exploration, the underlying data architecture has not yet been designed to support large-scale ingestion, transformation, and machine-learning–driven analysis. As biodiversity data expands in volume, structure, and source diversity, a more flexible and scalable data pipeline becomes essential.

This project proposes the design and implementation of an ETL-focused biodiversity data architecture positioned at the forefront of machine learning workflows. Rather than extending the existing campus mapping stack directly, we will develop a parallel ingestion and processing ecosystem that can accommodate heterogeneous biodiversity datasets — including structured campus inventories, semi-structured ecological surveys, and crowdsourced platforms such as iNaturalist — while remaining logically separated from established campus data standards used in previous projects. The goal is to create a robust backend foundation where biodiversity data can be normalized, stored, and made readily accessible for downstream machine learning models that support objective-driven route generation.

Our opportunity is to shift the focus from front-end mapping features to data engineering and algorithm readiness. By building a modular ETL pipeline and scalable storage layer, we enable future machine learning experimentation without disrupting legacy systems. This architecture will support rapid iteration on predictive models that optimize routes according to biodiversity objectives, such as maximizing species diversity exposure or prioritizing educational or conservation goals. At the same time, the system will ensure that each newly ingested biodiversity dataset can be easily represented as a spatial layer within OpenLayers, allowing data exploration and visualization to evolve alongside the backend pipeline.

This project is exploratory by design. Biodiversity data sources vary widely in structure and reliability, and the most effective machine learning approaches for route optimization remain an open question. Success will depend on iterative development, collaboration with domain stakeholders, and continuous refinement of the data architecture to support emerging analytical needs. One driving goal for this project is to support experimental study of wellness walks and other scientific findings of the ecowellness initiative. In practice, this infrastructure would enable researchers and students to design and run controlled science projects around wellness walks, for example, generating candidate routes, instrumenting them with ecological and experiential data, and systematically measuring how different environmental features correlate with participant wellbeing outcomes. By providing a reproducible framework for route design, data collection, and analysis, the project turns wellness walks into an experimental platform rather than an end goal, allowing collaborators, students, and future employers to clearly see how the system supports measurable ecowellness research.

Deliverables for this project include:

Primary collaborators and data providers include UMD biodiversity stakeholders and external observation platforms such as iNaturalist, with technical development informed by prior CMSC435 routing work while maintaining architectural independence from earlier campus data implementations. The main client communication will be with Juanita Choo of the Plant Sciences department, and Parker Homann the architect of the existing product.


ENGAGEMENT

How early can we predict teaching challenges in a large CS class? Can we do so in time to intervene and help students reach a better outcome?

The purpose of this project is to improve the quality of course content delivery, and one of our assumptions is that cost-effective harvesting of student activity in a class is a means to such an end. Good data and analysis (we hope) can shine a light on best practices. Another assumption is that we should be thinking outside the box to find ways to stimulate more interaction with students in order to improve outcomes.

There are the obvious data to track, of course, such as attendance. (We think there might be a correlation between attending a class and passing a class.) But what other opportunities are there? How about class participation - can we find that? Would knowing this help predict outcomes? What better insights can we get by tracking use of TA time and office hours? Are there labels to the kinds of time which let us discover new insights about what is going on in class? Can we tie these data to specific topics and concepts in the class? If so then narrowly we might be able to initiate early contact with a student who may otherwise be at risk, and broadly, should many students appear to miss concepts, we may discover material which could be re-taught with good effect.

Some tech activity will involve integrating existing data sources, such as grades (which are one of the important variables). Other of our activities will involve finding ways to capture likely data for consideration. For attendance and participation, we need a streamlined and reliable means of sampling presence and attention at various points in time. The procedure must not be disruptive, and in particular should not pose as a distraction to the instructor who after all is trying to juggle many things during a busy class. Similarly, we need to model and analyze the work flow of instructors (possibly a set of TAs as well) to ensure the reporting operations are intuitive and responsive to client needs. The attendance/participation data will need to be integrated with topical data (what is covered when in the course) and potentially the scheduling data from our office hours systems. Do we know what needs to be predicted by the system? No, not a clue! This is a project to find out the art of the possible.

Our client and product owner in this project is Larry Herman. Writing a program just by itself does not constitute success. Neither does dressing up expensive (time intensive) practices with some kind of software. To recognize success we need to see the system used in realistic ways which allow us to discover important properties about student participation in class and projects. It must measurably reduce the cost of winning superior outcomes. Our product will serve as the pilot for deeper efforts to promote student engagement and success.

Copyright © 2026 James M. Purtilo