Blog Articles 11–15

Lessons Learned Writing the CAREER

So, I won the NSF CAREER award. To say I’m excited about this would be an understatement — my first Ph.D student has support locked in, I get to actually do the work I’ve been building towards for years now, and we’re going to have a much better understanding of how recommender systems (mis)behave in response to their individual and social human contexts.

One of the things I found useful while planning and writing was hearing a variety of ‘path-to-the-CAREER’ stories and trying to take from them the things that would work for me. So here’s mine, for what it is worth. There are many paths to success; the opening line of Anna Karenina does not apply to grantwriting. My road is neither necessary nor sufficient.

This post is adapted and heavily expanded from notes I wrote in preparation for the successful applicant panel at Boise State’s CAREER prep workshop this spring.

Preface: Don’t Listen to Me

Survivorship bias by McGeddon, used under CC-BY-SA.

But don’t listen to me
We just met
What do I know?
— Propaganda, ‘Don’t Listen To Me’, Excellent (2012)

Academic advice writing, panels, etc. are plagued by the problem of survivorship bias. When we put together a panel of successful grant applicants or people who landed good jobs post-Ph.D, we sample success. Success is great, but looking only at successful cases often obscures the factors that actually made a difference between success and failure.

Keep that in mind whenever you are reading or hearing advice. That doesn’t mean observing success, or listening to people who have succeeded at something is entirely useless; it just means that, in order gain generalizable, applicable knowledge, you need to interpret the data in light of that bias.

Context: Timeline

I’m willing to wait for it.
— Lin-Manuel Miranda, ‘Wait For It’, Hamilton (2015)

I started my faculty career in Fall of 2014. I immediately submitted my first attempt at the CRII, building on my dissertation work; this proposal was declined. In summer 2015, I submitted my first CAREER attempt; this went down in flames, as I had no idea what I was doing — I submitted because ‘submitting CAREER is what junior faculty do’.

In Fall 2015, I decided to focus my energies on understanding fairness, bias, and discrimination in recommender systems. I submitted my second (unsuccessful) CRII attempt on the key idea that would later be the heart of my CAREER grant, and had a couple of my M.S. students work in earnest on projects that would produce preliminary results for future grant proposals.

In Fall 2016, I moved to Boise State, attended my first FATML in New York, and began to make in-person connections to the algorithmic fairness research community. I submitted an unsuccessful proposal to the Google Faculty Award program for a project that would become Research Objective 1 of my CAREER. I also began working informally on planning out my CAREER proposal’s research scope and getting feedback on the ideas.

In spring 2017, I began working specifically on the grant. I attended the NSF CISE CAREER workshop, scoped out collaborations and support across the university, and drafted my project summary for program officer feedback.

In April and May I did the bulk of the writing, and delivered a complete draft of the project description in late May. I then went on conference trip & drank terrible wine1 while our college grant specialist reviewed it.

I spent the first half of June revising, securing letters, and preparing the rest of the application package. With a complete application package ready, I sent it to our grant specialist and friends & colleagues for review, and took 2 weeks off to visit Belgium and the Netherlands. Didn’t touch it to clear my mind for revisions.

When I returned, I spent 2–3 weeks revising, finalizing, and submitted in mid-July.

Startup is There to be Used

via GIPHY
Every morning when I wake up, uh
Money on my mind
— Dr. Dre (with Kendrick Lamar), ‘The Recipe’, good kid, m.A.A.d. city (2012)

This probably deserves a blog post of its own, but the purpose of startup funding at a research-intensive institution is to get your research program started while you apply for grant funding to keep it going. CAREER is just one of the biggest and most signpost-y of those grants.

My top priority for my Boise State startup was laying the groundwork for the work that I would propose in CAREER. I focused this on the work itself, not on the grant — that would put a few too many eggs in one basket — but I put my student to work on projects that would advance my recommender fairness work and produce results that I could cite in CAREER and other proposals as preliminary findings. Loose ends from previous things I mostly worked on myself, with external collaborators, or put on the back burner.

Other things I paid for that I found helpful:

  • Traveling to NSF grant-writing workshops
  • Traveling to conferences
  • Traveling to give seminar talks (talking about my ideas gave me practice in pitching them and responding to questions and critiques)
  • Hosting a seminar speaker and picking his brain for feedback
  • A good computer setup (dual 4K monitors help me a lot)

Good Community

I built more than a rap career
I got my family here
— Sims (with Doomtree), ‘Bangarang’, No Kings (2011)

I am not a solo researcher; I do my best work in a vibrant intellectual community, where I can regularly have both serious scholarly discussions and informal banter.

The support of my great colleagues at Boise State Computer Science played a major role in getting me ready to write & carrying me through the process.

Good Institutional and Collaborator Support

Boise State is particularly good at providing resources and helping prospective CAREER applicants locate them. Each spring we have a 2-hour workshop for prospective applicants that provides pointers to many campus resources (along with a panel discussion on successful proposals), and a faculty learning community focused on the educational and outreach component.

Relatedly, I found it useful to connect my work directly to local and regional needs, and several people at Boise State were helpful in identifying those as well as securing collaboration commitments to address them.

And finally, I documented this support in Letters of Collaboration. Strategically, I saw this as demonstrating that my university is behind me (beyond the letter of commitment from my department chair), and that I already have some of the connections necessary to successfully carry out the work you are proposing. It’s easy to say I’m going to work with libraries on improving public information literacy with regards to recommender systems and similar modern information technologies; having letters from my first library and one of our Boise State librarians with connections to libraries across the state shows that the project has a decent chance of success. My full application included 9 letters of collaboration (and a table on the last page summarizing the role of each collaboration).

Connect to My Situation

Part of a research proposal is demonstrating that the work is worth funding. But another part, that seems particularly important for CAREER, is demonstrating that you should carry out the work at your institution.

I spent about two-thirds of a page on how the work I was proposing connected to my career and professional preparation and goals (including such details as a rural public library’s role in planting the seeds of my career, connecting this to my plan of working with libraries on information technology literacy) as well as to my department’s stated learning objectives and ongoing curricular enhancement activities. Part of my educational activities involved extending a current department effort in our undergraduate program to our graduate artificial intelligence classes.

Workshops

The single most useful source of wisdom I found was the NSF CISE CAREER workshop; slides from the talks are still available. It often isn’t terribly well-advertised, so contact your program officer in January or so if you want to go.

These workshops are one way to break the survivorship bias bubble. The one I attended had a talk on Top-Ten Mistakes that was very valuable. Breakout sessions with specific divisions and clusters were helpful in figuring out where to target my proposal.

Scoping The Work

via GIPHY

The CAREER solicitation provides a minimum budget, but does not provide much guidance for actual expected budget; budget was a place I got in trouble in my initial attempt.

At the CISE workshop, they provided a lot clearer guidance: one student + one month of summer per year, and associated research expenses.

This obviously helped with the budget, but also helped me scope the work in general.

I approached the research components of the proposal as if I were writing an overly-ambitious version of my first Ph.D student’s thesis proposal.

Explain Everything

Unusual methods via GIPHY

Somewhere along the way, I received some advice that said that you need to be explicit about a lot in a proposal — if you leave gaps, the reviewer will fill them in with something you didn’t intend, and judge the proposal on the basis of their assumptions. The way to avoid this is to explicitly say things, even if you find them obvious.

I had the particular problem of using somewhat unusual methods for my field: simulation studies. Current recommender systems research does not have a lot of simulation work, for good reason: it has a lot of weaknesses, and there are better ways to answer the research questions that a lot of research to date has asked.

I tend to find the common questions less interesting. And it turns out that simulation is the only way, with current technology and data, to answer the questions I want to answer. I handled this by taking time (about ⅔ page) to lay out the space of recommender system research methods and explain how my choice of simulation studies is necessary for my questions, and that it will enable more generalizable results. It worked in so far as none of the reviewers had a problem with my methods and the program officer called them out as one of the fundamental expected contributions in the abstract.

Feedback

via GIPHY

With any of my high-stakes writing, I would rather have my friends tear it apart so I can rebuild it better than to have to wait for a round of reviews to point out the flaws. It’s like a free review cycle — every objection I can identify and address (or decide to ignore) before submitting increases the likelihood of success. And I can’t find them all myself.

There are several classes of feedback that I made use of (and I am immensely grateful to everyone who took time to help me):

  • Concept feedback, mostly from colleagues but also from our college grant writing specialist in our grant planning meeting.
  • Early idea feedback — I ran ideas past colleagues a number of times to see if they might work out prior to putting them into the proposal.
  • Proposal reads from colleagues — at least one of my close collaborators provided feedback on proposal drafts. This helped catch ideas that weren’t working, unclear content, etc.
  • Professional reviews — our college grant writing specialist gave me two full review passes with detailed feedback, and I also got a review round from a consulting group that Boise State contracts with.
  • External ‘candidate panelist’ review — a senior colleague who is in my general field, but not deeply familiar with my research program, graciously agreed to read my proposal and gave me some very helpful feedback on improving my positioning and calibrating the level of detail in my research proposal.

It’s difficult to overstate the value of the candidate panelist review — if you can find such a person, I highly recommend it. Such reviews are one of the services provided by some grant prep consulting firms. Bear in mind that whoever you ask to do this will have a conflict-of-interest when it comes to actually reviewing, and research ethics demands you list them either in your COI form or in Reviewers Not To Include if they don’t fit in one of the defined COI categories. Long-time collaborators aren’t necessarily the best pick if you want to maximize the likelihood of finding one of your blind spots.

Writing

via GIPHY

I wrote my way out
Wrote everything down as far as I could see
— Lin-Manuel Miranda, ‘Hurricane’, Hamilton (2015)

At the end of the day, with a lot of prep and planning and logistics and feedback, you still need to write. There’s no one right way to write a winning proposal; we have a successful proposal library at Boise State, and they’re all over the map in both the low-level writing and high-level organization.

Here’s some things I did:

  • Win by Page 2. Based on advice & tips I read and discussions with colleagues with proposal reviewing experience, I figured that if I didn’t win the grant by page 2, I wouldn’t win. Details later in the proposal could sink my proposal, but they wouldn’t persuade a panelist or program officer to recommend funding. This governed a lot of my decisions in proposal structure, such as summarizing my research & educational objectives in second page forward references for the details.

  • Write in first person. Certainly not necessary, but it’s the approach I took. We like to think about dispassionate interest in science, but a CAREER proposal is asking for financial backing for your career of research and education.

  • Use active voice. The passive voice should be avoided.

  • Write what I believe. This is generally a good practice, but it’s worth restating. It’s easier to be convincing when you are convinced.

  • Write what I want. Very related to the previous point, but I took the approach of just writing what I wanted to do. I have been thinking about the next steps of my research for a while, and was planning to pursue this rough agenda regardless of this grant. I went in to this proposal at the point where I believed the work I was proposing was the most important thing for me to work on over the next few years, so I wrote it all down.

  • Help the reviewer. The easier you make it for the reviewer to like your proposal and find ways to defend it, the more likely they are to do so. Explicitly and concisely tell them why it is a good idea. I also used a larger font than required (11.5pt Times instead of 11pt) and slightly relaxed line spacing to make the proposal easier to read. Reviewers are people subject to psychology; I gambled that what I lost in room for exposition I would recover in reducing fatigue and frustration reading the proposal.

Winging It

via GIPHY

Pilot pen in pocket, I’m riding instinct and ink jets.
— Dessa, ‘Fighting Fish’, Parts of Speech (2013)

I got feedback, I took advice, but at the end of the day, I also trusted my instincts. I figured that if I didn’t fully buy in to what I was doing, either in the work itself or in how I presented it, that would come through and make the proposal less credible.

I can also be stubborn, especially when I’m convinced of something. I came to Boise State with a fresh sense of purpose, determined to make information systems good for people or lose my job trying. I deeply believe that understanding recommender systems’ social impact and consequences is the most important thing I can work on right now, and that my proposed methods will illuminate the subject. I have to believe that conviction came through in the proposal and made it more persuasive.

And when I’m writing, hip-hop is my psych music. Dessa goes on heavy repeat when I’m working on a grant proposal, and Hamilton has helped a lot. The rap I like best walks a fine line, balancing concision and repetition, storytelling and summary. Its rhythms and rhymes give it flow, building materials for working out the principle that good writing sounds good. Dessa’s urgent, confident blend of lyricism and boxing fueled my ‘Win by Page 2’ approach. ‘My Shot’ keeps me inspired to seize the day and try for the next project. The prophetic assessment of society that drips from the pens of socially-conscious rappers keeps me mindful of why I do this work.

I just wish I could have called John the day I got the news.

F*** the plan, man
I’m tryna call an audible
Probable lost cause
But I got a thing for long shots
— Dessa, ‘Warsaw’, Parts of Speech (2013)


  1. Merlot, that tasted like it was poured from an old boot.

Mathematical Notation for Recommender Systems

Over the years of teaching and research, I have gradually standardized the notation that I use for describing the math of recommender systems. This is the notation that I use in my classes, Joe Konstan and I have adopted for our MOOC, and that I use in most of my research papers. (And thanks to Joe for helping revise it to its current form.)

If you haven’t already settled on a notation, perhaps you would consider adopting this one. I also welcome feedback on improving it.

I have tried to strike a balance between clarity and clutter. I slightly overload the meaning of some symbols; in particular, I am loose with distinctions between sets and matrices, because it is generally clear from context which is being invoked; I do not overload external referents, however. I also have tried to keep this notation so that it can be hand-written, making it more useful in teaching but meaning that I cannot rely as much on typography to distinguish different objects (e.g. separating \(U\) and \(\mathcal{U}\) would be questionable).

Input Data

Our input data, for collaborative filtering, consists of:

\(U\)
The set of users in the system or data set.
\(I\)
The set of items in the system or data set.
\(R\)
The set of ratings in the data set. \(R\) can be used as either a set of rating observations or as a (partially observed) \(|U| \times |I|\) matrix; context makes it clear which meaning is intended.

Within each of these sets, we can refer to individual entries:

\(u, v \in U\)
I use \(u\) and \(v\) as variables referencing individual users. If I need more than two, then I use numeric subscripts \(u_1\), \(u_2\), etc.
\(i, j \in I\)
I use \(i\) and \(j\) as variables referencing individual items. Again, for more than two, I use numeric subscripts. This does mean that \(i\) is not available as a counter variable, but I do not find that to be too much of a difficulty in practice.
\(r_{ui} \in R\)
An individual rating value, the rating user \(u\) gave for item \(i\). In an implicit feedback setting, this takes on whatever value you are using as the ‘rating’: 1/0, a play count, etc.

One advantage of always using \(u,v\) for users and \(i,j\) for items is that meaning is clear from looking at a variable or subscripted variable. It also allows the following subset notations:

\(I_u \subset I\)
The set of items rated by user \(u\).
\(U_i \subset U\)
The set of users who have rated or purchased item \(i\).
\(R_u \subset R\)
The set of ratings given by user \(u\)
\(R_i \subset R\)
The set of items for item \(i\)
\(\vec{r}_u\)
User \(u\)’s rating vector, an \(|I|\)-dimensional vector with missing values for unrated items.
\(\vec{r}_i\)
Item \(i\)’s rating vector, a \(|U|\)-dimensional vector with missing values for users who have not rated the item.

Scores and Similarities

With this notation, we can write things like the user-user rating prediction formula:

\[\hat r_{ui} = s(i;u) = \frac{\sum_{v \in N(u,i)} w(u,v)(r_{vi}-\bar r_v)}{\sum_{v \in N(u,i)} |w(u,v)|} + \bar r_u\]

\(N(u;i)\) is the neighborhood for user \(u\) for the purpose of scoring item \(i\), and \(\bar r_u\) is user \(u\)’s average rating. I explicitly state the ranges of my summations for two reasons, even if they are implicitly clear for experienced recsys researchers: it is a common tripping point for students, and it is a place where subtle implementation differences get buried. I find that it is not overly cumbersome with this notation.

Matrix Factorization

For basic decomposition of the ratings matrix, I usually use \(P\) and \(Q\) these days:

\[R \approx PQ^{\mathrm{T}}\]

We then have user vectors \(\vec p_u\) and item vectors \(\vec q_i\). I like using the transpose notation for the factorization, so that latent features are on the columns of both \(P\) and \(Q\).

Advanced decompositions that involve more than two matrices will need to find additional matrices; one unfortunate tradeoff of using \(U\) for the set of users is that it is no longer available for a matrix.

Making a Fabric Poster

Fabric posters are great. You don't need a clumsy poster tube, just fold it up and put it in your suitcase.

We use Spoonflower for ours, and there are even instructions. But there are a few details that take some extra work to get right; I hope that this will guide you through them.

I have also made a video of the process, which you can view on YouTube.

The very short version, so you have an idea of where we're going:

  1. Create a high-resolution (150dpi) image of the poster
  2. Order it printed on Performance Knit fabric from Spoonflower
  3. Trim the poster to size

Required Software

In order to make this work, you need two pieces of software:

  • A poster design program; many people use PowerPoint, I often use Publisher. Scribus or LibreOffice should work if you want an open-source solution.
  • The Gimp to convert to an image and make final adjustments.

Publisher or Scribus may be able to directly export an image of sufficiently high quality, but PowerPoint cannot.

Preparing the Poster

In order for the poster to come out well, the source needs to be good quality. The most important thing is to use good images:

  • Use vector images if practical. R can export EMF with devEMF, which is the most reliable vector format to use with PowerPoint.
  • Generate high-resolution images. I usually render at 600 or 1200DPI to be safe.
  • Copying and pasting charts from Excel into PowerPoint should be fine.

Also, design your poster at the size you want to print it. That is, make a PowerPoint slide 40x32 inches if you want a 40x32 poster.

The poster will need to be trimmed, and trimming is not perfect, so take that into account in your design. It needs to have a full rectangle. This can either be the background, or a rectangle drawn around the poster; something to use as a guide point for trimming.

Leave at least 1/2" between any images and the crop rectangle. Any color blocks intended to go to the edge of the poster should go all the way to the crop rectangle.

Export to PDF

Export to PDF
PDF save options

Export your poster to a PDF file. While PowerPoint supports PNG export of slides, it cannot export them at sufficiently high resolution.

Turn off PDF/A compatibility using PowerPoint's PDF export feature. It can mess up the colors.

Double-check your PDF — look the document properties in your PDF viewer and make sure that the document is the size you want your poster to be.

Importing the PDF

Convert PDF to Image

Open The Gimp, and go to FileOpen. Browse to your PDF file, and import it. The crucial thing is to set the Resolution to 150.

Then save the image as a PNG file with FileSave As. This file should be big — a 40x32 poster will be 6000x4800 pixels.

If your poster is in portrait orientation, rotate it 90 degrees to be wider than tall.

Upload to Spoonflower

Setting up SpoonFlower

On Spoonflower, create a new design by going to DesignUpload.

Select ‘Performance Knit’ fabric, and ‘Center’ the image (under the Repeat options). It should confirm that it is at 150DPI, and your poster should look lined up properly.

Order!

Trimming

When it arrives, trim it. This is best done with a rotary cutter. We'll be making a video about that soon.

Presenting at FAT*

As you're hopefully well aware if you follow my Twitter, the end of this month will bring the first Conference on Fairness, Accountability, and Transparency (FAT*). It's been an honor to be involved in some of the planning for this conference series; it is also a great honor to have two papers in the first edition. Algorithmic fairness is the main focus of my research agenda for the next several years.


Fair Privacy

The first of these papers is Privacy for All, a position paper with my colleague Hoda Mehrpouyan and her student Rezvan Joshaghani. This paper arose out of a number of discussions Hoda and I were having about how our research topics and expertise might connect.

In this paper, we discuss the intersection of fairness and privacy, and identify a number of open questions regarding the fairness of privacy protections and the disparate impact of privacy risks. Fairness has been considered in some of the privacy literature — for example, certain fairness properties are part of the design goals for differential privacy — but there has been very little research (that we have been able to find, at any rate) on how these concepts interact in practice.

If already-vulnerable groups obtain less protection from privacy schemes, or pay a higher cost for obtaining that protection, that would be a bad thing. We want to see (and carry out) research to better understand how privacy risks and protections are distributed across society.

Hoda and I will be presenting this paper in the first paper session. We may even present it together! Though likely not in unison.

Michael D. Ekstrand, Rezvan Joshaghani, and Hoda Mehrpouyan. 2018. Privacy for All: Ensuring Fair and Equitable Privacy Protections. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT* 2018). PMLR, Proceedings of Machine Learning Research 81:35–47. Acceptance rate: 24%. Cited 10 times.

Disparate Effectiveness of Recommendations

The second is All The Cool Kids, How Do They Fit In? with Sole Pera and our students in PIReT.

In this follow-up to our RecSys 2017 poster, we demonstrate that recommender systems do not provide the same accuracy of recommendations, as we are able to measure, to all demographic groups of their users. We found:

  • In the MovieLens 1M and LastFM 1K data sets, men receive better recommendations than women.
  • In the LastFM 360K data set, old (50+) and young (under 18) receive better recommendations than other age groups.
  • These differences persist after controlling for the number of movies or songs a user has rated or played.
  • The MovieLens differences diminish, but do not seem to go away entirely, when we resample data to have the same number of men and women (but we need more nuanced statistics to better understand this difference).
  • Correcting for popularity bias can significantly change the demographic distribution of effectiveness, indicating a tradeoff in correcting for different misfeatures of recommender evaluation.

Demonstrating differences like these is the first step in understanding who benefits from recommender systems (and other information systems). Are our systems delivering benefit for all their users? Are we ok with that?

I'll be presenting this paper in the last session.

Michael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand, Oghenemaro Anuyah, David McNeill, and Maria Soledad Pera. 2018. All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT* 2018). PMLR, Proceedings of Machine Learning Research 81:172–186. Acceptance rate: 24%. Cited 10 times.

2017 State of the Tools

Surface Pro, notebook, iPhone, and pen

Last year, I wrote up the software and hardware tools I use. I am still using a lot of that stack, so I thought this year I would just highlight the changes:


  • I have switched from Windows Phone to an iPhone SE. I miss the Windows Phone UI somewhat, and Cortana was a generally more useful tool than Siri, but the change has on a whole been very positive. Better hardware and a much better app ecosystem.

  • I now use Chrome for all my work web browsing instead of Firefox.

  • I tend to use Windows Subsystem for Linux now instead of Docker containers or Vagrant for little Linux tasks on my desktop & surface.

  • I no longer use f.lux, since Windows 10 and iOS have built-in night light features.

  • I have changed from Zotero to Paperpile for managing references. I use the BibTeX4Word package and Paperpile's BibTeX export to add references to my documents.

  • I am now using Duplicati for backups at home.

  • Google Drive Stream is a substantial upgrade from the old Drive Sync.

  • Switched from LastPass to 1Password.

  • Added a YubiKey to my security setup (thanks, YubiKey!).

  • Some analog changes: I use a Leuchtturm 1917 square-ruled notebook instead of the Moleskine, and I'm using a Staedler Pigment Liner (typically 0.1mm) for writing.

There are a couple more applications I have been using that really sing on the Surface Pro:

  • Grapholite is a diagramming program optimized for pen and touch input. Somewhat like Visio, but much lighter-weight and better pen support. I use it for a lot of light diagramming when I don't need Visio's power. I do use Visio when a diagram needs to be more complex than what I can easily achieve in Grapholite.

  • Drawboard PDF is fantastic for marking up PDFs for paper reviews, revision cycles, and grading student work.