Assignment 6-api: api, generics, whole-system integration and design practice
Goals: Practice working with an API, and generic types. Practice design and implementation on a real-world problem.
Instructions
As always, be very careful with your naming conventions.
The submissions will be organized as follows:
Submission Homework 6 Problem 1: The Canvas discussion post answerer.
Problem 1 – Code Sunday, November 17th at 10:00pm, Demo November 18th during class
1 Problem 1 – Canvas discussion post answerer
This project involves a synthesis of things we have studied and practiced, things we have not studied for you to explore and master on your own, integration with real-world systems and real-world documentation, and solving a truly extrinsically motivated problem.
Your task for this assignment is to design and implement a command-line executable program that takes no arguments and that, when executed, will perform whatever task is asked in our course’s most recent Canvas discussion post, with respect to the most recent PDF document in our course’s Files>Documents on Canvas.
Your program will interact remotely with Canvas and will almost certainly use Microsoft’s Azure service.
You should follow good design recipe practice. Have one purpose per method, one purpose per interface/class/unit of code. You should clearly articulate this purpose statement. I absolve you of needing to write your templates. Each method should be well tested.
You should be prepared to demonstrate to me that you can invoke ./answer_discussion (or ./answer_discussion.exe on Windows for those who wish) to execute your program, and that alone should do everything necessary.
1.1 Hints:
Every resource is open and valid to use. No one else has their students doing this project. Free to use example source code you find online (providing you follow applicable intellectual property laws), the LLM of your choice, textbooks, blogs, API documentation, 3rd party code or extensions, interactive video tutorials, etc. The only thing you can’t use is some other student/pair’s code or solution. You can talk to other students about ideas but it shouldn’t get to the level of looking at or describing your source code to them or vica versa. You also shouldn’t hire an outside consultant to do this for you.
To help you with the above, recall that you have access to Enterprise instances of Microsoft Copilot. You can also access Copilot from bing.com on Edge—
if you are logged out you can work around the 30 queries/day limitation. You will likely come across new aspects of Java like static methods: go figure them out.
Microsoft offers heaps of documentation, learning, and support for their products. You should consider going through relevant tutorials
You are welcome to continue using the Tester.jar library, but the more common industrial tool is JUnit. Consider JQwik for property-based tests.
If existing command-line/scriptable tools already exist to solve parts of your problem, feel free to use them. This time I don’t want you to re-invent the wheel. Under WSL, you should use apt-get.
Last time I checked, chocolatey was the best-in-breed software package manager for Windows. I don’t know what the state of the art is these days. You should figure that out and install your software that way.
You will find some loosely related Python code at the following two repositories:
Please note that I mean it when I say this is loosely related. You will still need to figure out a substantial number of things on your own.
I do not care which shell you use; you can use whichever you find most convenient. I believe that for Windows users your choices are:
cmd.exe
powershell
bash (or zsh, fish, etc) under WSL
You can use your shell language or some scripting language of your choice to kick off your Java program, but the bulk of the programming should be done in Java. You are also welcome to do your script programming in Java, although I would not recommend it.
Your command-line process can kick off a compile-execute sequence, or you can pre-build and deploy your executable, so that executing your program on the command line causes an executable jar to run.
When you are testing, you should be testing against the test site. Do not test in production!!
As you are learning programmatic dependency management in Java, you should look into Maven and/or Gradle
1.2 Details:
The only things I put in Files>Documents will be machine-readable PDF documents. No other file formats will be used, and the only PDFs you need to consider are the ones posted in that location.
You must use Microsoft’s Azure to access your LLM. I’m being deliberate here because I want you to have to do something slightly different from the most popular tutorials on the internet. You have $100 free credit to use.
You must also ensure the quality of your code using appropriate language tooling and configurations for these should also be managed under version control. You should programmatically check for/ensure conformance to a coding style standard, and you should use some tooling to assess the quality of your code, tests, and test coverage.
The instructions for your LLM to follow will be contained entirely in
the body of the discussion prompt—
Your program should be robust enough to handle a folder with a large collection of documents, and still find the most recent.
If it matters, determine "most recent" by the creation date.
Your program should continue to work even after I change the
discussion post or add a new document—
Do not hard-code your API keys into your code. That would be a very poor practice. Instead, store the API key(s) in environment variables, and use system calls to collect that value.
Have a robust test suite. What happens if there are no documents available? What if two documents were posted at the exact same time? What if the download of that document fails part-way through? What happens if the API key cannot be found? What happens if the API key is invalid? What happens if the file system runs out of disk space while you are writing the document? Will your program always be able to write to disk? What happens if your Azure account runs out of credits partway through? Write tests and check for such situations.
Write a well-written README.md file at the top level of your repository. It should explain the prerequsites for using your software, how to set it up and configure it, the types and kinds of arguments, and any other options or features it provides. You would be surprised how difficult it can be to use software without an adequate README.
1.3 Bonus:
"Soft scanned" PDFs You can make your program more robust by using a model that accepts "soft-scanned" PDFs—
documents whose contents are not machine readable— to extract the text contents of that PDF before solving the discussion post. Audio recorded responses
Some instructors are newly interested in audio recorded responses. Make your program accept a command-line flag --audio. When invoked with this flag, your program will create and upload an audio recording instead of posting raw plain text.
Use a model that will use small clips of your voice and the text to be read, and produce an audio recording of your response that you then upload. It needs to be a real human-sounding voice, and should in fact sound like you. No credit for using an old-school like text-to-speech engine.
CI/CD pipeline and commit hooks
Running all of your test suite locally could be expensive. One solution is to use your version control system’s continuous integration / continuous delivery pipeline to perform those checks for you.
Create a runner for your CI platform of choice (we have campus-wide access to bitbucket via your school Atlassian account. and its pipelines for instance), or if you prefer to use github.com they also have Github Actions.
Docker
As you have probably noticed, getting all the software installed, running and correctly configured is a chore. Having to manage a remote installation of all those tools in userspace on a target machine would be quite the chore! If you want, you can build and submit your whole program and all its dependencies as a Docker image that I can run on an 86-64 machine (my laptop is an Apple Silicon machine, but I’ll be running it from within another container on a remote machine).
Make sure you give me complete instructions on how to run and use it in your README.md.