Introduction
Overview
Teaching: 5 min
Exercises: 0 minQuestions
What kind of code should we (not) write?
Objectives
How to make the software packages written by us more accessible by our colleagues.
What is a programming language?
“Programs must be written for people to read, and only incidentally for machines to execute.”
-Abelson & Sussman, Structure and Interpretation of Computer Programs1
We work in a large collaboration so that we read/use each other’s codes very often. Sadly but truly discussing coding is part of our lives (at least mine). No matter what is going on in this world, we talk (complain) about software often. Again at least for me:
Ok that is a bit exaggerated but software is indeed important for us. We gathered feedback from many people and extracted several simple axioms. It is interesting that we all have agreed on those simple things despite various backgrounds. In this mini section we are not gonna tell you everything you need to know about writing elegant codes but rather share our experience working with software in ATLAS. Hopefully in the end we could convince you that it is worth thinking about this.
Why care about good code?
We are busy people with dozens of things to pay attention to in our work. Why should we put in additional effort to produce quality code? Why not just settle with code that works?
Focusing on quality code is actually one of the best ways to save you and your team tremendous amounts of time in the future. Clean and elegant code is much easier to understand, maintain, and extend.
Code quality is a broad measure of how useful code is. This obviously includes whether or not the code behaves as it’s intended to, but also includes non-functional aspects such as robustness or maintainability. Whether or not a piece of code is high-quality can depend on the industry or team it’s written for, and is often subjective even within that team. As we gather our collective experience as developers however, we can define some broad code quality goals that are useful as guiding principles.
Code quality goals2
These are a sample of typical code quality goals one might adhere to. These are not exhaustive, but rarely should you write software without keeping these goals in mind.
-
Readability, consistency — how easy it is to read and understand sections of the code; this includes code clarity, simplicity, and documentation.
-
Predictability, reliability, and robustness — software behavior should be predictable, and not prone to hidden bugs.
-
Maintainability and extensibility — fixing, updating and improving software should be as simple as possible, not inherently complex.
What other goals?
Can you think of other non-functional code quality goals that might not be included in this list?
Examples
Security is often another goal that we don’t often encounter in ATLAS analysis.
Efficiency is another goal that is more important in some cases than others.
This lesson will introduce practices and habits that will help you achieve these goals. Some techniques are good for software development in general, and some are specific to the coding we do in ATLAS.
Key Points
We are not professional software engineers but software is important.
Our code is very likely to be shared with our colleagues.
Useful Documents
Overview
Teaching: 5 min
Exercises: 0 minQuestions
Where could I get guidance?
Objectives
Knowing where to seek for help within the collaboration.
Materials from the ATLAS software team
The ATLAS computing team has many useful documenations (twiki pages, slides, notes, etc). Here is a list of them recommended by software experts:
- A nice twiki page on software quality
- A note on ATLAS C++ guidelines
- A set of slides summarizing best C++ practice
Of course there are more available. If you have any questions you can also ask via the mailing lists: hn-atlas-PATHelp@cern.ch (Physics Analysis Tools) and hn-atlas-offlineSWHelp@cern.ch (mainly Athena).
When is following this guide required?
If you do find yourself doing any Athena development (e.g. simulation, reconstruction, derivation) you are required to adhere to the style guide. Merge request shifters will hopefully enforce these rules.
Combined Performance and Analysis software packages are typically less restrictive with their requirements, but a few still maintain contribution guidelines that encourage high-quality code. For example, EGamma CP closely follows the Google C++ and Python style guides.
Let’s be honest though. We don’t expect that everyone is going to open those links and read the full style guides. The best way to learn these things is incrementally.
- Keep an eye out for code that you find to be particularly readable and try to emulate that style.
- When your code gets modified in a code review, investigate why the reviewer made the change they did.
- Use an IDE. Most modern IDEs have automated style checking and autoformatting. Learn from its suggestions.
Internet
There are always answers to any coding problems on the Internet. We hope this bootcamp has expanded your ATLAS related coding vocabulary so that you can acquire your answers easier and faster.
Key Points
Start paying attention to good coding practice.
There are experts willing to help!
Use GitLab Wisely
Overview
Teaching: 10 min
Exercises: 0 minQuestions
What usually causes trouble when using git?
How could we avoid them?
Objectives
Highlight non-optimal practice that occurs very often.
As illustrated by the previous instructors, Git (GitLab) provides very convenient functions maintain our repositories. We should make the most of it.
It is not Dropbox
The first recipe I got after joining ATLAS was:
"Copy my folder on lxplus to your directory and follow the README there.
Do not use the version on GitLab, it does not work"
We hope this will not be the recipe you will share with your colleagues:).
Pull often, commit often
Have you done this before:
mv project project_backup
git clone ssh://git@gitlab.cern.ch:7999/project.git
Because of merge conflicts? The time spent on fixing individual files again can be saved by many git commit/push/pull commands while developping your code.
Informative commit messages
I browsed the commit messages I made six year ago:
I’d hope the younger me could have just written down what the mistake was rather than being apologetic.
This is rather ambiguous as it could really have been literally nothing or a reflection of my state of mind. It turned out that I fixed a dump bug introduced by me.
There are many articles defining how and why to write a good commit message (here’s one) but the basics are:
- Short one-line subject describing what is done
- Longer multiline explanation of why it was done
- Tell more than what files were changed. That’s already obvious in the commit.
Merge requests and code reviews
Merge requests
Merge requests in GitLab (and pull requests in GitHub) are some of the most powerful tools for collaboration and ensuring good practices are followed. You should always make a merge request when submitting a significant change to a code base used by multiple people. Merge requests give other developers the opportunity to
- comment on your code,
- make suggestions, and
- review your work.
Code reviews
Code reviews are one of the most effective techniques for any software project. Not only do they keep bugs from making it into the code base, but they are great for sharing knowledge. “When a developer is finished working on an issue, another developer looks over the code and considers questions like:
- Are there any obvious logic errors in the code?
- Looking at the requirements, are all cases fully implemented?
- Are the new automated tests sufficient for the new code? Do existing automated tests need to be rewritten to account for changes in the code?
- Does the new code conform to existing style guidelines?” (- Code reviews)
They are one of the best tools for mentoring new developers. Merge requests are the core structures for code reviews. They enable comments on individual lines of code, and keep a record of discussion about how a decision was reached. You are encouraged to (a) review any code that someone writes to your repository and (b) request than any code you push to a repository is looked over by a colleague.
An excellent way to learn about the ATLAS codebase is to sign up for a merge review shift! As a level-1 shifter you mostly follow a checklist and can raise any confusing changes to the level-2 shifters. Try it out!
Key Points
Commit often.
Write informative commit messages.
We Like Portable Code
Overview
Teaching: 10 min
Exercises: 0 minQuestions
Why should we make our code portable?
Objectives
Learn ways to improve protability.
Usually we do our work on certain machines. People based at CERN would prefer lxplus while people based outside may prefer their local clusters or personal machines. Each individual may also have a different shell setup. It is very rare that the person who writes the code is the sole user. As a result, it is important to make sure the code can be used by others on a different machine/setup easily. It is really worth the time thinking about this as in the end if your code does not work for someone, you will likely receive an email or message asking for help:)
Avoid user/machine specific setups
It can be prevented by using relative paths or environment variables. It also makes our softwares easier to run on the grid as the grid can not access your local machines.
Test it often
The infrastructures/softwares keep evolving (slc5 -> slc6 -> centos7, Athena Rel 20.7 -> Rel 21). The packages involved can change very often as well. We should test our code often.
Having it compiled is not the end goal as we want the results to be meaningful ultimately. So it is better to test the whole work flow.
It feels really great when everything just works out of the box after following the recipes in the README.
CI saves the day!
The best tool for this is continuous integration (CI) testing on GitLab (or GitHub). Using Docker images, you can perform tests on many emulate platforms. Remember to look back at the CI/CD tutorial if you need a reminder of how to get started with this.
Key Points
Use relative path and environment variables.
Test it often.
Use CI.
We Like Short Code
Overview
Teaching: 10 min
Exercises: 0 minQuestions
How to make our code easier for others to use?
Objectives
“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.”
- Bill Gates
We work in a large collaboration and very likely when we need some functions in our code someone has already done it before and it may even exist in many well maintained packages.
ROOT has a lot of stuffs
Question:
"Have you created some functions and later on realized that ROOT already has them?"
Nowadays there are more well maintained packages to use so look around! Start with a look at the ROOT Tutorials to get a sense of what is possible.
But be careful of the pitfalls:
Functions are better than copy/paste
Prrof of concept:
We can use multiple files.
"Would you rather fight 100 500-line C++ scripts or 10 5000-line C++ scripts?"
Refactoring
Another habit that’s good to form is refactoring your code. Refactoring is the process of restructuring existing computer code without changing its external behavior. It is intended to improve is non-functional attributes while preserving its functional behavior.
This can include:
- Simplifying logical expressions
- Removing duplicated code
- Replacing unclear variable names with clearer ones
- Improving C++ functions for more efficient compiling/running (virtual, static, constant, etc.)
- Re-writing a loop to be more efficient
- Including checks for undesired inputs (e.g. null pointers, edge cases)
Do you think that the first time you write some code it is going to be the most correct or the most readable? Me neither. Refactoring is an important process of making sure the code is high-quality. It also is often how you catch unnoticed bugs!
Testing is a very important part of refactoring! The more automated tests you have, the more confident you can be that your changes didn’t unexpectedly break anything.
Want to know more?
Refactoring Guru has some excellent lessons on how to refactor your code and why it’s useful.
Key Points
Use existing libraries if we can.
Create common functions if we need them very often.
Split if certain blocks become monstrous.
Design Your Package
Overview
Teaching: 10 min
Exercises: 0 minQuestions
When and how to design a package efficiently.
Objectives
Understand when designing a program can be useful.
Learn that UML exists
Fast solutions vs sustainable solutions
This is a very unique challenge for us. Usually our performance is evaluated by the results we’ve produced, not the codes behind the scenes. As a result fast solutions are naturally appealing. There are many fast solutions to lure us. For instance, we can open a TBrowser in ROOT and make the plots beautiful by hands. Also there are many unexpected studies when doing an analysis. We often have to perform a very specific study that goes beyond the current scope of your software. In this case, having something simple and fast to tackle down this one or two issues is indeed the most sufficient way.
However, we also have frequent updates from various groups (combined performance groups, data preparation groups, physics groups, etc) and some of them can involve substantial changes. It is not surprising to find out that a script we wrote before does not work any more.
A sustainable solution would take those possible changes into account and become flexible enough to incorporate them. They are better integrated into the whole workflow and can save us a lot of time in the long run. Giving the facts mentioned above, it is also challenging.
Plan and organize the package
Blueprints are important. It is hard to make sure all small pieces fit together nicely if we do not think ahead. Draw some diagrams on paper and think about what the upstream/downstream softwares are and who the expected users are. Try to figure out which parts of the package need to be rigid (or OK to be rigid) and which parts are definitely necessary to be flexible
Let’s have a quick brain storming section:
"What are the most common requests or comments one would receive during meetings that may require software change?"
Change the binning? Extend the binning? Add a histogram? Update CP recommendations? Add a ratio panel? Submitting grid jobs? Babysitting grid jobs?…..
Well if you have a list let’s try to make our software able to do those things easily.
Live with it or change it
Unless we start a project from scratch, we will probably be given some existing packages. If everything is up-to-date, nicely written and well organized. CONGRATULATIONS! But often this is not the case. In this situation, we can either live with it and try to get the work done or try to change it. When we have such a dilemma, we should think about how long we are gonna use them and how many more new things we are supposed to do. Often the time it is better to revamp. Do not let the argument “Please do not re-invent the wheels” stop you, you are crafting better wheels.
Before you start writing code
Take a deep breath… and get a blank sheet of paper
Software design
Modeling
Modeling is the designing of software applications before coding
The software we create is complex and involves lots of abstract ideas that need to work together. The classes and functions you write can quickly grow to more than you can keep in your head all at once. Large or small projects can all benefit from a careful consideration of how your software should work. If you want to build a house, your best first tool is a pencil, not a hammer.
An excellent way to begin writing software is to draw a diagram. These can be diagrams that describe the inheritance of classes and interfaces, the ownership of objects, the logical flow of one or more functions, etc. All of these are useful thoughts to put on paper. Of course software is not a house; code is free. Your design can change as you learn new things about your data or your requirements.
Unified Modelling Language
A powerful standard for producing these designs is called Unified Modelling Language, or UML. UML consists of many specifications that are widely adopted for creating designs for your software. These diagrams can be grouped into two broad categories:
- Structural Diagrams
- Behavioral Diagrams
You may find some of these diagrams more useful than others. Here is a very nice UML tutorial that gives a concise description of how to use the most common diagrams.
Here are a few specific examples of diagrams that you may want to try drawing for your code:
Class diagram
The class diagram is a structural diagram. The purpose of class diagram is to model the static view of an application. Class diagrams are the only diagrams which can be directly mapped with object-oriented languages and thus widely used at the time of construction.1
Interaction diagram
The purpose of interaction diagrams is to visualize the interactive behavior of the system. Visualizing the interaction is a difficult task. Hence, the solution is to use different types of models to capture the different aspects of the interaction.1
One of these models, shown below, is called the sequence diagram which captures the time sequence of the message flow from one object to another.
You don’t have to learn how these diagrams work right now, but it’s good to know that they exist. When you’re staring at a blank piece of paper trying to figure out how to design your code, take a look at the UML tutorial because chances are, theres a useful model to follow.
-
Taken from tutorialspoint.com/uml ↩ ↩2
Key Points
Modeling is the designing of software applications before coding.
Layout the diagrams on paper.
Think about the expected users.
Consolidating standalone scripts.
Modeling is the designing of software applications before coding.
Subtle Things That Make Our Lives Better
Overview
Teaching: 5 min
Exercises: 0 minQuestions
What prevents others from understanding our codes?
Objectives
Start apply some simple but very useful practice.
Here are some suggestions from ATLAS colleagues about practices they think are particularly useful for our collaboration.
We need comments
Quotes from a colleague:
"I did something quick and dirty in my code, wrote it down in my notebook (physical notebook). Now I have problems when using my code and I want to check what I did before. Guess what, I left my notebook at CERN and I could not go back!"
An Instagram post from a friend when she was on VACATION:
A snapshot from a piece of code written by a paranoid programmer:
We should add comments where we are not sure whether what we are doing is correct (FIXME) or at places to conclude a loop/block. Also if we think we might be the only person on this planet writing such a block of code, we should probably add some comments.
Use easy to understand variable/file names
It is quite desperate to look for a variable named as “m” and figure out what it is doing. Unless it is an index or a counter or something similar.
We physicists like acronyms, they can be funny but we should make sure they are understandable when using them in our code.
Follow a certain naming convention can help a lot.
Testing
Write automated tests for the libraries and tools you are developing. For example pytest is a great library to help with testing. To take a somewhat strong position: any code that’s not tested should be assumed to not work.
For more reading, Atlassian has a nice post on the different kinds of testing in software.
Documentation
The never ending battle of documenting your code… Documentation is very important not just for your future colleagues, but also for yourself. “The most likely person to read your code is you six months from now, and unfortunately, past you doesn’t respond to emails” (no idea where I heard that).
Like the several levels of testing, there are several levels of documentation. Code can be seen as self-documenting if it’s written with clear variable names. It also benefits from sensible inline comments. And it’s always a good idea to leave method docstrings in python or Doxygen-style comments in C++ to clarify what your functions are doing. These are also used for automated documentation parsing.
On top of these low-level pieces of documentation, each software package you write should have a README file that explains what it does, how it’s used, and who to ask for help.
Key Points
Add comments in your code.
Use variable/function/file names that are easy to interpret.
Summary
Overview
Teaching: 5 min
Exercises: 0 minQuestions
Is it worth our time?
Objectives
It is worth the time!
It is well paid off
Of course paying attention to those points requires some extra time. But it is well paid off. It saves both ours and our colleagues’ time. We can then have more interesting lunch conversations:)
Start early, benefit early
This extensive bootcamp has exposed you to a wide range of topics and know we have all learned a lot. This might be your first investment on software. Keep doing it and let it grow. Like long term investment, small but continuous efforts can result in a great fortune. Unlike any investment, this is much safer.
Key Points
Long term gain.